Re: [git pull] vfs.git - including i_mutex wrappers

From: Al Viro
Date: Sat Jan 23 2016 - 18:10:33 EST


On Sun, Jan 24, 2016 at 09:44:35AM +1100, Dave Chinner wrote:

> FWIW, I'm not opposed to making such a locking change - I'm more
> concerned about the fact I'm finding out about plans for such a
> fundamental locking change from a pull request on the last day of a
> merge window....

Look at the commit message (*and* pull request posting) of an earlier vfs.git
pull request in the beginning of this window. Or into the thread back in
May when it had been first proposed (and pretty much the same patch had been
generated and posted by Linus). Changes needed for parallel ->lookup() had
been discussed; it was a side branch of one of the RCU symlink threads and
ISTR your own postings in it.

For filesystems it will be mostly transparent, except for the possibility of
parallel calls of ->lookup() on different names in the same directory.
Which XFS shouldn't give a fuck about, unless I'm seriously misreading your
code.

Basic scheme: have dentries under ->lookup() marked as such and inserted into
hash (still negative, obviously) before calling ->lookup(). The method itself
is called with ->i_mutex replacement taken shared; anyone running into such
dentry in dcache lookup will wait (on parent directory ->i_mutex queue,
explicitly kicked once ->lookup() is done) and repeat dcache lookup. In
case when the current code would've silently freed ->lookup() argument (error
or "I've used an existing dentry") the thing will be unhashed and dropped,
without ever losing the "it's under lookup" flag. Primitives like
d_splice_alias() would remove the flag in question.

Anyone running into such sucker in RCU mode should treat it as "dcache miss,
need to fall back to non-lazy mode". Flag (as all dentry flags) protected
by ->d_lock.

If a filesystem simply wants to preserve the existing exclusion, it should
add a private per-inode mutex and take it in its ->lookup() instance; all
other methods will still get exclusion on ->i_mutex replacement.

There will be interesting prereqs, but for XFS it's a non-issue. Now,
something like ceph or lustre... <shudder> Again, for XFS (for any
normal Unix filesystems, really) no extra exclusion should be needed.

readdir() is another potential target for weaker exclusion (i.e. switching
it to taking that thing shared), but that's a separate story and I'd prefer
to deal with ->lookup() first. There are potentially hairy issues around
the instances that pre-seed dcache and I don't want to mix them into the
initial series.