Re: [RFC PATCH] shrink_dcache_parent() deadlock

From: Miklos Szeredi
Date: Tue Jan 10 2012 - 05:05:40 EST


On Tue, Jan 10, 2012 at 3:02 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Jan 9, 2012 at 5:34 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> Dave Chinner (1):
>> Â Â Âdcache: use a dispose list in select_parent
>
> Hmm. Should this also have been marked for stable? I'm assuming that
> the possible deadlock (strictly speaking it's a livelock, I guess?)

I tested Dave's patch and the bug can still be easily reproduced.

And that's to be expected, as the intermediate "being on the lru"
state that Dave's patch eliminates doesn't play a fundamental part in
the mechanism of the livelock. It does eliminate one trylock, but
that's not the one critical to this bug (removing it is a very good
idea anyway).

The critical trylock here is the one in dentry_kill() which tries to
lock the parent. That is basically guaranteed to fail if there are
more then one instances of shrink_dcache_parent() running on the same
dentry, because select_parent() holds that parent lock for a
relatively long time.

With Dave's patch the livelock becomes like this (basically the same
as without the patch):

1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1

2 - CPU1: select_parent(P) locks P->d_lock

3 - CPU0: shrink_dentry_list() locks C->d_lock
dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock

4 - CPU1: select_parent(P) locks C->d_lock,
moves C from dispose list being processed on CPU0 to new
dispose list, returns 1

5 - CPU0: shrink_dentry_list() finds dispose list empty, returns

6 - Goto 2 with CPU0 and CPU1 switched

Thanks,
MIklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/