Re: [BUG] Lockless patches cause hardlock under heavy IO

From: Ryan Hope
Date: Tue Jun 24 2008 - 11:12:20 EST


Well i tried to run pure -mm this weekend, it locked as soon as I got
into gnome so I applied a couple of the bug fixes from lkml and -mm
seems to be running stable now. I cant seem to get it to hard lock
now, at least not doing the simple stuff that was causing it to hard
lock on my other patchset, either the lockless patches expose some bug
that in -rc6 or lockless requires some other patches further up in the
-mm series file.

On Mon, Jun 23, 2008 at 8:13 PM, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> On Monday 23 June 2008 23:05, Paul E. McKenney wrote:
>> On Mon, Jun 23, 2008 at 09:54:52PM +1000, Nick Piggin wrote:
>> > On Monday 23 June 2008 13:51, Ryan Hope wrote:
>> > > well i get the hardlock on -mm with out using reiser4, i am pretty
>> > > sure is swap related
>> >
>> > The guys seeing hangs don't use PREEMPT_RCU, do they?
>> >
>> > In my swapping tests, I found -mm3 to be stable with classic RCU, but
>> > on a hunch, I tried PREEMPT_RCU and it crashed a couple of times rather
>> > quickly. First crash was in find_get_pages so I suspected lockless
>> > pagecache doing something subtly wrong with the RCU API, but I just got
>> > another crash in __d_lookup:
>>
>> Could you please send me a repeat-by? (At least Alexey is no longer
>> alone!)
>
> OK, I had DEBUG_PAGEALLOC in the .config, which I think is probably
> important to reproduce it (but the fact that I'm reproducing oopses
> with << PAGE_SIZE objects like dentries and radix tree nodes indicates
> that there is even more free-before-grace activity going undetected --
> if you construct a test case using full pages, it might become even
> easier to detect with DEBUG_PAGEALLOC).
>
> 2 socket, 8 core x86 system.
>
> I mounted two tmpfs filesystems, one contains a single large file
> which is formatted as 1K block size ext3 and mounted loopback, the
> other is used directly. Linux kernel source is unpacked on each mount
> and concurrent make -j128 on each. This pushes it pretty hard into
> swap. Classic RCU survived another 5 hours of this last night.
>
> But that's a fairly convoluted test for an RCU problem. I expect it
> should be easier to trigger with something more targetted...
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/