Re: Similar behaviour without BUG() message(was: Re: 2.6.5-aa3: kernel BUG at mm/objrmap.c:137!)

From: Alexander Y. Fomichev
Date: Fri Apr 23 2004 - 04:26:03 EST


On Friday 23 April 2004 06:27, Andrea Arcangeli wrote:
> following your trace I got to some futex issue (likely invoked by NPTL),
> and following the futex code I noticed a race with threads in
> expand_stack and likely the futex is effectively calling expand_stack
> too. Then I found a race in expand_stack with threads.
>
> here the details:
>
> in 2.6-aa expand_stack is moving down vm_start and vm_end with only the
> read semaphore held. this means this thing can even run in parallel in
> both cpus, the latter will corrupt vm_pgoff which generate malfunction
> with anon-vma:
>
> vma->vm_start = address;
> vma->vm_pgoff -= grow;
>
> this isn't an issue for the file mappings because only anon vmas can be
> growsdown (the filemappings could never work right in filemap_nopage if
> expand_stack would mess with pgoff and vm_start without holding the
> mmap_sem in write mode).
>
> serializing this with anon-vma is easy and scalable with the
> anon_vma_lock (not an mm-wide lock like the page_table_lock).
> This will also serialize perfectly with the objrmap. objrmap should be
> the only thing that cares about vm_pgoff and vm_start being coherent at
> the same time for anon-mappings in anon-vma, and the anon_vma_lock
> provides perfect locking for that.
>
> One of the scalability and simplicity locking beauty of anon-vma is the
> total avoidance of page_table_lock for vma merging and all other vma
> operations in general, and true usage of page_table_lock only for the
> pagetables (and future usage of vma->page_table_lock only for pagetables
> too). I wouldn't really like to giveup on this and to reintroduce the
> whole mess of page_table_lock in the vma merging and in expand stack
> that all other implementations like 2.6 mainline and anonmm are
> suffering from, and I hope my fix will be enough. Could you test if this
> patch helps?

Of course, but it will take some time,i've got two followed oops
on previous kernel after almost two weeks of stable work. I think week
or two would be enough "timeslice" to say something. Unfortunately i
can't try artificial tests on a production system but i guess ordinary
load would be enough.

--
Best regards.
Alexander Y. Fomichev <gluk@xxxxxxx>
Public PGP key: http://sysadminday.org.ru/gluk.asc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/