Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

From: Fox Chen
Date: Thu Dec 17 2020 - 03:55:58 EST


On Thu, Dec 17, 2020 at 12:46 PM Ian Kent <raven@xxxxxxxxxx> wrote:
>
> On Tue, 2020-12-15 at 20:59 +0800, Ian Kent wrote:
> > On Tue, 2020-12-15 at 16:33 +0800, Fox Chen wrote:
> > > On Mon, Dec 14, 2020 at 9:30 PM Ian Kent <raven@xxxxxxxxxx> wrote:
> > > > On Mon, 2020-12-14 at 14:14 +0800, Fox Chen wrote:
> > > > > On Sun, Dec 13, 2020 at 11:46 AM Ian Kent <raven@xxxxxxxxxx>
> > > > > wrote:
> > > > > > On Fri, 2020-12-11 at 10:17 +0800, Ian Kent wrote:
> > > > > > > On Fri, 2020-12-11 at 10:01 +0800, Ian Kent wrote:
> > > > > > > > > For the patches, there is a mutex_lock in kn-
> > > > > > > > > >attr_mutex,
> > > > > > > > > as
> > > > > > > > > Tejun
> > > > > > > > > mentioned here
> > > > > > > > > (
> > > > > > > > > https://lore.kernel.org/lkml/X8fe0cmu+aq1gi7O@xxxxxxxxxxxxxxx/
> > > > > > > > > ),
> > > > > > > > > maybe a global
> > > > > > > > > rwsem for kn->iattr will be better??
> > > > > > > >
> > > > > > > > I wasn't sure about that, IIRC a spin lock could be used
> > > > > > > > around
> > > > > > > > the
> > > > > > > > initial check and checked again at the end which would
> > > > > > > > probably
> > > > > > > > have
> > > > > > > > been much faster but much less conservative and a bit
> > > > > > > > more
> > > > > > > > ugly
> > > > > > > > so
> > > > > > > > I just went the conservative path since there was so much
> > > > > > > > change
> > > > > > > > already.
> > > > > > >
> > > > > > > Sorry, I hadn't looked at Tejun's reply yet and TBH didn't
> > > > > > > remember
> > > > > > > it.
> > > > > > >
> > > > > > > Based on what Tejun said it sounds like that needs work.
> > > > > >
> > > > > > Those attribute handling patches were meant to allow taking
> > > > > > the
> > > > > > rw
> > > > > > sem read lock instead of the write lock for
> > > > > > kernfs_refresh_inode()
> > > > > > updates, with the added locking to protect the inode
> > > > > > attributes
> > > > > > update since it's called from the VFS both with and without
> > > > > > the
> > > > > > inode lock.
> > > > >
> > > > > Oh, understood. I was asking also because lock on kn-
> > > > > >attr_mutex
> > > > > drags
> > > > > concurrent performance.
> > > > >
> > > > > > Looking around it looks like kernfs_iattrs() is called from
> > > > > > multiple
> > > > > > places without a node database lock at all.
> > > > > >
> > > > > > I'm thinking that, to keep my proposed change straight
> > > > > > forward
> > > > > > and on topic, I should just leave kernfs_refresh_inode()
> > > > > > taking
> > > > > > the node db write lock for now and consider the attributes
> > > > > > handling
> > > > > > as a separate change. Once that's done we could reconsider
> > > > > > what's
> > > > > > needed to use the node db read lock in
> > > > > > kernfs_refresh_inode().
> > > > >
> > > > > You meant taking write lock of kernfs_rwsem for
> > > > > kernfs_refresh_inode()??
> > > > > It may be a lot slower in my benchmark, let me test it.
> > > >
> > > > Yes, but make sure the write lock of kernfs_rwsem is being taken
> > > > not the read lock.
> > > >
> > > > That's a mistake I had initially?
> > > >
> > > > Still, that attributes handling is, I think, sufficient to
> > > > warrant
> > > > a separate change since it looks like it might need work, the
> > > > kernfs
> > > > node db probably should be kept stable for those attribute
> > > > updates
> > > > but equally the existence of an instantiated dentry might
> > > > mitigate
> > > > the it.
> > > >
> > > > Some people might just know whether it's ok or not but I would
> > > > like
> > > > to check the callers to work out what's going on.
> > > >
> > > > In any case it's academic if GCH isn't willing to consider the
> > > > series
> > > > for review and possible merge.
> > > >
> > > Hi Ian
> > >
> > > I removed kn->attr_mutex and changed read lock to write lock for
> > > kernfs_refresh_inode
> > >
> > > down_write(&kernfs_rwsem);
> > > kernfs_refresh_inode(kn, inode);
> > > up_write(&kernfs_rwsem);
> > >
> > >
> > > Unfortunate, changes in this way make things worse, my benchmark
> > > runs
> > > 100% slower than upstream sysfs. :(
> > > open+read+close a sysfs file concurrently took 1000us. (Currently,
> > > sysfs with a big mutex kernfs_mutex only takes ~500us
> > > for one open+read+close operation concurrently)
> >
> > Right, so it does need attention nowish.
> >
> > I'll have a look at it in a while, I really need to get a new autofs
> > release out, and there are quite a few changes, and testing is seeing
> > a number of errors, some old, some newly introduced. It's proving
> > difficult.
>
> I've taken a breather for the autofs testing and had a look at this.

Thanks. :)

> I think my original analysis of this was wrong.
>
> Could you try this patch please.
> I'm not sure how much difference it will make but, in principle,
> it's much the same as the previous approach except it doesn't
> increase the kernfs node struct size or mess with the other
> attribute handling code.
>
> Note, this is not even compile tested.

I failed to apply this patch. So based on the original six patches, I
manually removed kn->attr_mutex, and added
inode_lock/inode_unlock to those two functions, they were like:

int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
{
struct inode *inode = d_inode(path->dentry);
struct kernfs_node *kn = inode->i_private;

inode_lock(inode);
down_read(&kernfs_rwsem);
kernfs_refresh_inode(kn, inode);
up_read(&kernfs_rwsem);
inode_unlock(inode);

generic_fillattr(inode, stat);
return 0;
}

int kernfs_iop_permission(struct inode *inode, int mask)
{
struct kernfs_node *kn;

if (mask & MAY_NOT_BLOCK)
return -ECHILD;

kn = inode->i_private;

inode_lock(inode);
down_read(&kernfs_rwsem);
kernfs_refresh_inode(kn, inode);
up_read(&kernfs_rwsem);
inode_unlock(inode);

return generic_permission(inode, mask);
}

But I couldn't boot the kernel and there was no error on the screen.
I guess it was deadlocked on /sys creation?? :D

> kernfs: use kernfs read lock in .getattr() and .permission()
>
> From: Ian Kent <raven@xxxxxxxxxx>
>
> From Documenation/filesystems.rst and (slightly outdated) comments
> in fs/attr.c the inode i_rwsem is used for attribute handling.
>
> This lock satisfies the requirememnts needed to reduce lock contention,
> namely a per-object lock needs to be used rather than a file system
> global lock with the kernfs node db held stable for read operations.
>
> In particular it should reduce lock contention seen when calling the
> kernfs .permission() method.
>
> The inode methods .getattr() and .permission() do not hold the inode
> i_rwsem lock when called as they are usually read operations. Also
> the .permission() method checks for rcu-walk mode and returns -ECHILD
> to the VFS if it is set. So the i_rwsem lock can be used in
> kernfs_iop_getattr() and kernfs_iop_permission() to protect the inode
> update done by kernfs_refresh_inode(). Using this lock allows the
> kernfs node db write lock in these functions to be changed to a read
> lock.
>
> Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> ---
> fs/kernfs/inode.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> index ddaf18198935..568037e9efe9 100644
> --- a/fs/kernfs/inode.c
> +++ b/fs/kernfs/inode.c
> @@ -189,9 +189,11 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat,
> struct inode *inode = d_inode(path->dentry);
> struct kernfs_node *kn = inode->i_private;
>
> - down_write(&kernfs_rwsem);
> + inode_lock(inode);
> + down_read(&kernfs_rwsem);
> kernfs_refresh_inode(kn, inode);
> - up_write(&kernfs_rwsem);
> + up_read(&kernfs_rwsem);
> + inode_unlock(inode);
>
> generic_fillattr(inode, stat);
> return 0;
> @@ -281,9 +283,11 @@ int kernfs_iop_permission(struct inode *inode, int mask)
>
> kn = inode->i_private;
>
> - down_write(&kernfs_rwsem);
> + inode_lock(inode);
> + down_read(&kernfs_rwsem);
> kernfs_refresh_inode(kn, inode);
> - up_write(&kernfs_rwsem);
> + up_read(&kernfs_rwsem);
> + inode_unlock(inode);
>
> return generic_permission(inode, mask);
> }
>


thanks,
fox