Re: [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file

From: Christian Brauner
Date: Tue Jun 27 2023 - 13:23:24 EST


On Tue, Jun 27, 2023 at 10:03:15AM -0700, Suren Baghdasaryan wrote:
> On Mon, Jun 26, 2023 at 11:25 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, Jun 26, 2023 at 01:17:12PM -0700, Suren Baghdasaryan wrote:
> > > kernfs_ops.release operation can be called from kernfs_drain_open_files
> > > which is not tied to the file's real lifecycle. Introduce a new kernfs_ops
> > > free operation which is called only when the last fput() of the file is
> > > performed and therefore is strictly tied to the file's lifecycle. This
> > > operation will be used for freeing resources tied to the file, like
> > > waitqueues used for polling the file.
> >
> > This is confusing, shouldn't release be the "last" time the file is
> > handled and then all resources attached to it freed? Why do we need
> > another callback, shouldn't release handle this?
>
> That is what I thought too but apparently kernfs_drain_open_files()
> can also cause ops->release to be called while the file keeps on
> living (see details here:
> https://lore.kernel.org/all/CAJuCfpFZ3B4530TgsSHqp5F_gwfrDujwRYewKReJru==MdEHQg@xxxxxxxxxxxxxx/#t).
>
> >
> >
> > >
> > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > > ---
> > > fs/kernfs/file.c | 8 +++++---
> > > include/linux/kernfs.h | 5 +++++
> > > 2 files changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > > index 40c4661f15b7..acc52d23d8f6 100644
> > > --- a/fs/kernfs/file.c
> > > +++ b/fs/kernfs/file.c
> > > @@ -766,7 +766,7 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
> > >
> > > /* used from release/drain to ensure that ->release() is called exactly once */
> > > static void kernfs_release_file(struct kernfs_node *kn,
> > > - struct kernfs_open_file *of)
> > > + struct kernfs_open_file *of, bool final)
> >
> > Adding flags to functions like this are a pain, now we need to look it
> > up every time to see what that bool means.
> >
> > And when we do, we see that it is not documented here so we have no idea
> > of what it is :(
> >
> > This is not going to be maintainable as-is, sorry.
>
> It's a static function with only two places it's used in the same
> file. I can add documentation too if that helps.
>
> >
> > > {
> > > /*
> > > * @of is guaranteed to have no other file operations in flight and
> > > @@ -787,6 +787,8 @@ static void kernfs_release_file(struct kernfs_node *kn,
> > > of->released = true;
> > > of_on(of)->nr_to_release--;
> > > }
> > > + if (final && kn->attr.ops->free)
> > > + kn->attr.ops->free(of);
> > > }
> > >
> > > static int kernfs_fop_release(struct inode *inode, struct file *filp)
> > > @@ -798,7 +800,7 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> > > struct mutex *mutex;
> > >
> > > mutex = kernfs_open_file_mutex_lock(kn);
> > > - kernfs_release_file(kn, of);
> > > + kernfs_release_file(kn, of, true);
> > > mutex_unlock(mutex);
> > > }
> > >
> > > @@ -852,7 +854,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> > > }
> > >
> > > if (kn->flags & KERNFS_HAS_RELEASE)
> > > - kernfs_release_file(kn, of);
> > > + kernfs_release_file(kn, of, false);
> >
> > Why isn't this also the "last" time things are touched here? why is it
> > false?
>
> Because it's called from the context of the process doing rmdir() and
> if another process has the file in the directory opened it will have
> that file alive until it calls the last fput(). These are the call
> paths:
>
> do_rmdir
> cgroup_rmdir
> kernfs_drain_open_files
> kernfs_release_file(..., false)
> kn->attr.ops->release(), of->released=true

This seems weird to me. Why would that trigger a ->release() call. In
general, calling ->release() kinda betrays the name.
So imho, this really wants to be a separate ->drain() or ->shutdown()
call (and seems conceptually related to f_op->flush()).

>
> fput()
> kernfs_fop_release()
> kernfs_release_file(..., true), of->released==true,
> kn->attr.ops->release() is not called.