Re: [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates

From: Ian Kent
Date: Wed Jun 02 2021 - 01:42:12 EST


On Tue, 2021-06-01 at 15:18 +0200, Miklos Szeredi wrote:
> On Fri, 28 May 2021 at 08:34, Ian Kent <raven@xxxxxxxxxx> wrote:
> >
> > The inode operations .permission() and .getattr() use the kernfs
> > node
> > write lock but all that's needed is to keep the rb tree stable
> > while
> > updating the inode attributes as well as protecting the update
> > itself
> > against concurrent changes.
> >
> > And .permission() is called frequently during path walks and can
> > cause
> > quite a bit of contention between kernfs node operations and path
> > walks when the number of concurrent walks is high.
> >
> > To change kernfs_iop_getattr() and kernfs_iop_permission() to take
> > the rw sem read lock instead of the write lock an additional lock
> > is
> > needed to protect against multiple processes concurrently updating
> > the inode attributes and link count in kernfs_refresh_inode().
> >
> > The inode i_lock seems like the sensible thing to use to protect
> > these
> > inode attribute updates so use it in kernfs_refresh_inode().
> >
> > Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> > ---
> >  fs/kernfs/inode.c |   10 ++++++----
> >  fs/kernfs/mount.c |    4 ++--
> >  2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> > index 3b01e9e61f14e..6728ecd81eb37 100644
> > --- a/fs/kernfs/inode.c
> > +++ b/fs/kernfs/inode.c
> > @@ -172,6 +172,7 @@ static void kernfs_refresh_inode(struct
> > kernfs_node *kn, struct inode *inode)
> >  {
> >         struct kernfs_iattrs *attrs = kn->iattr;
> >
> > +       spin_lock(&inode->i_lock);
> >         inode->i_mode = kn->mode;
> >         if (attrs)
> >                 /*
> > @@ -182,6 +183,7 @@ static void kernfs_refresh_inode(struct
> > kernfs_node *kn, struct inode *inode)
> >
> >         if (kernfs_type(kn) == KERNFS_DIR)
> >                 set_nlink(inode, kn->dir.subdirs + 2);
> > +       spin_unlock(&inode->i_lock);
> >  }
> >
> >  int kernfs_iop_getattr(struct user_namespace *mnt_userns,
> > @@ -191,9 +193,9 @@ int kernfs_iop_getattr(struct user_namespace
> > *mnt_userns,
> >         struct inode *inode = d_inode(path->dentry);
> >         struct kernfs_node *kn = inode->i_private;
> >
> > -       down_write(&kernfs_rwsem);
> > +       down_read(&kernfs_rwsem);
> >         kernfs_refresh_inode(kn, inode);
> > -       up_write(&kernfs_rwsem);
> > +       up_read(&kernfs_rwsem);
> >
> >         generic_fillattr(&init_user_ns, inode, stat);
> >         return 0;
> > @@ -284,9 +286,9 @@ int kernfs_iop_permission(struct user_namespace
> > *mnt_userns,
> >
> >         kn = inode->i_private;
> >
> > -       down_write(&kernfs_rwsem);
> > +       down_read(&kernfs_rwsem);
> >         kernfs_refresh_inode(kn, inode);
> > -       up_write(&kernfs_rwsem);
> > +       up_read(&kernfs_rwsem);
> >
> >         return generic_permission(&init_user_ns, inode, mask);
> >  }
> > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > index baa4155ba2edf..f2f909d09f522 100644
> > --- a/fs/kernfs/mount.c
> > +++ b/fs/kernfs/mount.c
> > @@ -255,9 +255,9 @@ static int kernfs_fill_super(struct super_block
> > *sb, struct kernfs_fs_context *k
> >         sb->s_shrink.seeks = 0;
> >
> >         /* get root inode, initialize and unlock it */
> > -       down_write(&kernfs_rwsem);
> > +       down_read(&kernfs_rwsem);
> >         inode = kernfs_get_inode(sb, info->root->kn);
> > -       up_write(&kernfs_rwsem);
> > +       up_read(&kernfs_rwsem);
> >         if (!inode) {
> >                 pr_debug("kernfs: could not get root inode\n");
> >                 return -ENOMEM;
> >
>
> This last hunk is not mentioned in the patch header.  Why is this
> needed?

Yes, that's right.

The lock is needed to keep the node rb tree stable.

kernfs_get_inode() calls kernfs_refresh_inode() indirectly so
since the i_lock is probably not needed here this hunk could
just as well have gone into the rwsem change but because of
that kernfs_refresh_inode() call it also makes sense to put
it here.

I'd prefer to keep it here and clearly what's going on isn't
as obvious as I thought so I can add this reasoning to the
description if you still think it's worth while?

>
> Otherwise looks good.
>
> Thanks,
> Miklos