Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: Al Viro
Date: Fri May 15 2015 - 23:17:14 EST


On Fri, May 15, 2015 at 07:23:11PM -0700, Linus Torvalds wrote:

> For filesystems that say that they are ok with, make lookup_slow()
> (and *only* lookup_slow for now) instead take the rwsem for reading,
> but in addition to that, take a hashed mutex.
>
> By "hashed mutex", I mean having a smallish table of mutexes (say,
> 1024), and just creating a hash based on the name-hash and the parent
> pointer. That way we can avoid all the issues with adding a new lock
> to the dentry itself, or having to allocate a new child dentry just
> for the lock. It *could* cause some cross-directory serialization due
> to hash collisions, but that shouldn't be noticeable if the hash is of
> a reasonable size and quality.

What for? All we need is a flag, waitqueue and being woken
up when the flag gets cleared. So let's just use the queue of parent's
->i_mutex and explicitly kick it when removing dentry flag. We *are*
holding a reference on parent (we need that to hold that sucker shared,
after all), so it's not going away under us...

I'm all for gradual transformations, but in this case I suspect
that doing it on per-fs basis isn't the best way to do it; gradual massage
of code using dcache lookups or walking the lists of children in filesystems
(fortunately, it's fairly rare these days, and we only need to care about
the code checking if such a beast is hashed; d_alloc() already places new
dentry on the list of children) would seem to be a better approach. We'd
also need to audit fs/dcache.c tree-walking-related code itself, but that's
much more limited.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/