[PATCHSET][RFC][CFT] parallel lookups

From: Al Viro
Date: Fri Apr 15 2016 - 20:52:41 EST


The thing appears to be working. It's in vfs.git#work.lookups; the
last 5 commits are the infrastructure (fs/namei.c and fs/dcache.c; no changes
in fs/*/*) + actual switch to rwsem.

The missing bits: down_write_killable() (there had been a series
posted introducing just that; for now I've replaced mutex_lock_killable()
calls with plain inode_lock() - they are not critical for any testing and
as soon as down_write_killable() gets there I'll replace those), lockdep
bits might need corrections and right now it's only for lookups.

I'm going to add readdir to the mix; the primitive added in this
series (d_alloc_parallel()) will need to be used in dcache pre-seeding
paths, ncpfs use of dentry_update_name_case() will need to be changed to
something less hacky and syscalls calling iterate_dir() will need to
switch to fdget_pos() (with FMODE_ATOMIC_POS set for directories as well
as regulars). The last bit is needed for exclusion on struct file
level - there's a bunch of cases where we maintain data structures
hanging off file->private and those really need to be serialized. Besides,
serializing ->f_pos updates is needed for sane semantics; right now we
tend to use ->i_mutex for that, but it would be easier to go for the same
mechanism as for regular files. With any luck we'll have working parallel
readdir in addition to parallel lookups in this cycle as well.

The patchset is on top of switching getxattr to passing dentry and
inode separately; that part will get changes (in particular, the stuff
agruen has posted lately), but the lookups queue proper cares only about
being able to move security_d_instantiate() to the point before dentry
is attached to inode.

1/15: security_d_instantiate(): move to the point prior to attaching dentry
to inode. Depends on getxattr changes, allows to do the "attach to inode"
and "add to dentry hash" parts without dropping ->d_lock in between.

2/15 -- 8/15: preparations - stuff similar to what went in during the last
cycle; several places switched to lookup_one_len_unlocked(), a bunch of
direct manipulations of ->i_mutex replaced with inode_lock, etc. helpers.

kernfs: use lookup_one_len_unlocked().
configfs_detach_prep(): make sure that wait_mutex won't go away
ocfs2: don't open-code inode_lock/inode_unlock
orangefs: don't open-code inode_lock/inode_unlock
reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack()
reconnect_one(): use lookup_one_len_unlocked()
ovl_lookup_real(): use lookup_one_len_unlocked()

9/15: lookup_slow(): bugger off on IS_DEADDIR() from the very beginning
open-code real_lookup() call in lookup_slow(), move IS_DEADDIR check upwards.

10/15: __d_add(): don't drop/regain ->d_lock
that's what 1/15 had been for; might make sense to reorder closer to it.

11/15 -- 14/15: actual machinery for parallel lookups. This stuff could've
been a single commit, along with the actual switch to rwsem and shared lock
in lookup_slow(), but it's easier to review if carved up like that. From the
testing POV it's one chunk - it is bisect-safe, but the added code really
comes into play only after we go for shared lock, which happens in 15/15.
That's the core of the series.

beginning of transition to parallel lookups - marking in-lookup dentries
parallel lookups machinery, part 2
parallel lookups machinery, part 3
parallel lookups machinery, part 4 (and last)

15/15: parallel lookups: actual switch to rwsem

Note that filesystems would be free to switch some of their own uses of
inode_lock() to grabbing it shared - it's really up to them. This series
works only with directories locking, but this field has become an rwsem
for all inodes. XFS folks in particular might be interested in using it...

I'll post the individual patches in followups. Again, this is also available
in vfs.git #work.lookups (head at e2d622a right now). The thing survives
LTP and xfstests without regressions, but more testing would certainly be
appreciated. So would review, of course.