Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

From: Ian Kent
Date: Thu Dec 17 2020 - 05:17:07 EST


On Thu, 2020-12-17 at 16:54 +0800, Fox Chen wrote:
> On Thu, Dec 17, 2020 at 12:46 PM Ian Kent <raven@xxxxxxxxxx> wrote:
> > On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote:
> > > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote:
> > > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent <raven@xxxxxxxxxx>
> > > > wrote:
> > > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent <raven@xxxxxxxxxx
> > > > > > >
> > > > > > wrote:
> > > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > > > > > For the patches, there is a mutex_lock in kn-
> > > > > > > > > > > attr_mutex,
> > > > > > > > > > as
> > > > > > > > > > Tejun
> > > > > > > > > > mentioned here
> > > > > > > > > > (
> > > > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@xxxxxxxxxxxxxxx/
> > > > > > > > > > ),
> > > > > > > > > > maybe a global
> > > > > > > > > > rwsem for kn->iattr will be better??
> > > > > > > > >
> > > > > > > > > I wasn't sure about that, IIRC a spin lock could be
> > > > > > > > > used
> > > > > > > > > around
> > > > > > > > > the
> > > > > > > > > initial check and checked again at the end which
> > > > > > > > > would
> > > > > > > > > probably
> > > > > > > > > have
> > > > > > > > > been much faster but much less conservative and a bit
> > > > > > > > > more
> > > > > > > > > ugly
> > > > > > > > > so
> > > > > > > > > I just went the conservative path since there was so
> > > > > > > > > much
> > > > > > > > > change
> > > > > > > > > already.
> > > > > > > >
> > > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH
> > > > > > > > didn't
> > > > > > > > remember
> > > > > > > > it.
> > > > > > > >
> > > > > > > > Based on what Tejun said it sounds like that needs
> > > > > > > > work.
> > > > > > >
> > > > > > > Those attribute handling patches were meant to allow
> > > > > > > taking
> > > > > > > the
> > > > > > > rw
> > > > > > > sem read lock instead of the write lock for
> > > > > > > kernfs_refresh_inode()
> > > > > > > updates, with the added locking to protect the inode
> > > > > > > attributes
> > > > > > > update since it's called from the VFS both with and
> > > > > > > without
> > > > > > > the
> > > > > > > inode lock.
> > > > > >
> > > > > > Oh, understood. I was asking also because lock on kn-
> > > > > > > attr_mutex
> > > > > > drags
> > > > > > concurrent performance.
> > > > > >
> > > > > > > Looking around it looks like kernfs_iattrs() is called
> > > > > > > from
> > > > > > > multiple
> > > > > > > places without a node database lock at all.
> > > > > > >
> > > > > > > I'm thinking that, to keep my proposed change straight
> > > > > > > forward
> > > > > > > and on topic, I should just leave kernfs_refresh_inode()
> > > > > > > taking
> > > > > > > the node db write lock for now and consider the
> > > > > > > attributes
> > > > > > > handling
> > > > > > > as a separate change. Once that's done we could
> > > > > > > reconsider
> > > > > > > what's
> > > > > > > needed to use the node db read lock in
> > > > > > > kernfs_refresh_inode().
> > > > > >
> > > > > > You meant taking write lock of kernfs_rwsem for
> > > > > > kernfs_refresh_inode()??
> > > > > > It may be a lot slower in my benchmark, let me test it.
> > > > >
> > > > > Yes, but make sure the write lock of kernfs_rwsem is being
> > > > > taken
> > > > > not the read lock.
> > > > >
> > > > > That's a mistake I had initially?
> > > > >
> > > > > Still, that attributes handling is, I think, sufficient to
> > > > > warrant
> > > > > a separate change since it looks like it might need work, the
> > > > > kernfs
> > > > > node db probably should be kept stable for those attribute
> > > > > updates
> > > > > but equally the existence of an instantiated dentry might
> > > > > mitigate
> > > > > the it.
> > > > >
> > > > > Some people might just know whether it's ok or not but I
> > > > > would
> > > > > like
> > > > > to check the callers to work out what's going on.
> > > > >
> > > > > In any case it's academic if GCH isn't willing to consider
> > > > > the
> > > > > series
> > > > > for review and possible merge.
> > > > >
> > > > Hi Ian
> > > >
> > > > I removed kn->attr_mutex and changed read lock to write lock
> > > > for
> > > > kernfs_refresh_inode
> > > >
> > > > down_write(&kernfs_rwsem);
> > > > kernfs_refresh_inode(kn, inode);
> > > > up_write(&kernfs_rwsem);
> > > >
> > > >
> > > > Unfortunate, changes in this way make things worse, my
> > > > benchmark
> > > > runs
> > > > 100% slower than upstream sysfs. :(
> > > > open+read+close a sysfs file concurrently took 1000us.
> > > > (Currently,
> > > > sysfs with a big mutex kernfs_mutex only takes ~500us
> > > > for one open+read+close operation concurrently)
> > >
> > > Right, so it does need attention nowish.
> > >
> > > I'll have a look at it in a while, I really need to get a new
> > > autofs
> > > release out, and there are quite a few changes, and testing is
> > > seeing
> > > a number of errors, some old, some newly introduced. It's proving
> > > difficult.
> >
> > I've taken a breather for the autofs testing and had a look at
> > this.
>
> Thanks. :)
>
> > I think my original analysis of this was wrong.
> >
> > Could you try this patch please.
> > I'm not sure how much difference it will make but, in principle,
> > it's much the same as the previous approach except it doesn't
> > increase the kernfs node struct size or mess with the other
> > attribute handling code.
> >
> > Note, this is not even compile tested.
>
> I failed to apply this patch. So based on the original six patches, I
> manually removed kn->attr_mutex, and added
> inode_lock/inode_unlock to those two functions, they were like:
>
> int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
> u32 request_mask, unsigned int query_flags)
> {
> struct inode *inode = d_inode(path->dentry);
> struct kernfs_node *kn = inode->i_private;
>
> inode_lock(inode);
> down_read(&kernfs_rwsem);
> kernfs_refresh_inode(kn, inode);
> up_read(&kernfs_rwsem);
> inode_unlock(inode);
>
> generic_fillattr(inode, stat);
> return 0;
> }
>
> int kernfs_iop_permission(struct inode *inode, int mask)
> {
> struct kernfs_node *kn;
>
> if (mask & MAY_NOT_BLOCK)
> return -ECHILD;
>
> kn = inode->i_private;
>
> inode_lock(inode);
> down_read(&kernfs_rwsem);
> kernfs_refresh_inode(kn, inode);
> up_read(&kernfs_rwsem);
> inode_unlock(inode);
>
> return generic_permission(inode, mask);
> }
>
> But I couldn't boot the kernel and there was no error on the screen.
> I guess it was deadlocked on /sys creation?? :D

Right, I guess the locking documentation is out of date. I'm guessing
the inode lock is taken somewhere over the .permission() call. If that
usage is consistent it's easy fixed, if the usage is inconsistent it's
hard to deal with and amounts to a bug.

I'll have another look at it.

Also, it sounds like I'm working from a more recent series.

I had 8 patches, dropped the last three and added the one I posted.
If I can work out what's going on I'll post the series for you to
check.

Ian

>
> > kernfs: use kernfs read lock in .getattr() and .permission()
> >
> > From: Ian Kent <raven@xxxxxxxxxx>
> >
> > From Documenation/filesystems.rst and (slightly outdated) comments
> > in fs/attr.c the inode i_rwsem is used for attribute handling.
> >
> > This lock satisfies the requirememnts needed to reduce lock
> > contention,
> > namely a per-object lock needs to be used rather than a file system
> > global lock with the kernfs node db held stable for read
> > operations.
> >
> > In particular it should reduce lock contention seen when calling
> > the
> > kernfs .permission() method.
> >
> > The inode methods .getattr() and .permission() do not hold the
> > inode
> > i_rwsem lock when called as they are usually read operations. Also
> > the .permission() method checks for rcu-walk mode and returns
> > -ECHILD
> > to the VFS if it is set. So the i_rwsem lock can be used in
> > kernfs_iop_getattr() and kernfs_iop_permission() to protect the
> > inode
> > update done by kernfs_refresh_inode(). Using this lock allows the
> > kernfs node db write lock in these functions to be changed to a
> > read
> > lock.
> >
> > Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> > ---
> > fs/kernfs/inode.c | 12 ++++++++----
> > 1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> > index ddaf18198935..568037e9efe9 100644
> > --- a/fs/kernfs/inode.c
> > +++ b/fs/kernfs/inode.c
> > @@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path
> > *path, struct kstat *stat,
> > struct inode *inode = d_inode(path->dentry);
> > struct kernfs_node *kn = inode->i_private;
> >
> > - down_write(&kernfs_rwsem);
> > + inode_lock(inode);
> > + down_read(&kernfs_rwsem);
> > kernfs_refresh_inode(kn, inode);
> > - up_write(&kernfs_rwsem);
> > + up_read(&kernfs_rwsem);
> > + inode_unlock(inode);
> >
> > generic_fillattr(inode, stat);
> > return 0;
> > @@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode *inode,
> > int mask)
> >
> > kn = inode->i_private;
> >
> > - down_write(&kernfs_rwsem);
> > + inode_lock(inode);
> > + down_read(&kernfs_rwsem);
> > kernfs_refresh_inode(kn, inode);
> > - up_write(&kernfs_rwsem);
> > + up_read(&kernfs_rwsem);
> > + inode_unlock(inode);
> >
> > return generic_permission(inode, mask);
> > }
> >
>
> thanks,
> fox