Re: [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file

From: Suren Baghdasaryan
Date: Tue Jun 27 2023 - 13:03:33 EST


On Mon, Jun 26, 2023 at 11:25 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Jun 26, 2023 at 01:17:12PM -0700, Suren Baghdasaryan wrote:
> > kernfs_ops.release operation can be called from kernfs_drain_open_files
> > which is not tied to the file's real lifecycle. Introduce a new kernfs_ops
> > free operation which is called only when the last fput() of the file is
> > performed and therefore is strictly tied to the file's lifecycle. This
> > operation will be used for freeing resources tied to the file, like
> > waitqueues used for polling the file.
>
> This is confusing, shouldn't release be the "last" time the file is
> handled and then all resources attached to it freed? Why do we need
> another callback, shouldn't release handle this?

That is what I thought too but apparently kernfs_drain_open_files()
can also cause ops->release to be called while the file keeps on
living (see details here:
https://lore.kernel.org/all/CAJuCfpFZ3B4530TgsSHqp5F_gwfrDujwRYewKReJru==MdEHQg@xxxxxxxxxxxxxx/#t).

>
>
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > ---
> > fs/kernfs/file.c | 8 +++++---
> > include/linux/kernfs.h | 5 +++++
> > 2 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > index 40c4661f15b7..acc52d23d8f6 100644
> > --- a/fs/kernfs/file.c
> > +++ b/fs/kernfs/file.c
> > @@ -766,7 +766,7 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
> >
> > /* used from release/drain to ensure that ->release() is called exactly once */
> > static void kernfs_release_file(struct kernfs_node *kn,
> > - struct kernfs_open_file *of)
> > + struct kernfs_open_file *of, bool final)
>
> Adding flags to functions like this are a pain, now we need to look it
> up every time to see what that bool means.
>
> And when we do, we see that it is not documented here so we have no idea
> of what it is :(
>
> This is not going to be maintainable as-is, sorry.

It's a static function with only two places it's used in the same
file. I can add documentation too if that helps.

>
> > {
> > /*
> > * @of is guaranteed to have no other file operations in flight and
> > @@ -787,6 +787,8 @@ static void kernfs_release_file(struct kernfs_node *kn,
> > of->released = true;
> > of_on(of)->nr_to_release--;
> > }
> > + if (final && kn->attr.ops->free)
> > + kn->attr.ops->free(of);
> > }
> >
> > static int kernfs_fop_release(struct inode *inode, struct file *filp)
> > @@ -798,7 +800,7 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> > struct mutex *mutex;
> >
> > mutex = kernfs_open_file_mutex_lock(kn);
> > - kernfs_release_file(kn, of);
> > + kernfs_release_file(kn, of, true);
> > mutex_unlock(mutex);
> > }
> >
> > @@ -852,7 +854,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
> > }
> >
> > if (kn->flags & KERNFS_HAS_RELEASE)
> > - kernfs_release_file(kn, of);
> > + kernfs_release_file(kn, of, false);
>
> Why isn't this also the "last" time things are touched here? why is it
> false?

Because it's called from the context of the process doing rmdir() and
if another process has the file in the directory opened it will have
that file alive until it calls the last fput(). These are the call
paths:

do_rmdir
cgroup_rmdir
kernfs_drain_open_files
kernfs_release_file(..., false)
kn->attr.ops->release(), of->released=true

fput()
kernfs_fop_release()
kernfs_release_file(..., true), of->released==true,
kn->attr.ops->release() is not called.

So, when kn->attr.ops->release() is called by do_rmdir() the file is
still kept alive by another process holding a reference to the file.
It's a problem in our case because if we free the resources associated
with that file (waitqueue head) from inside our release() operation
then the ongoing poll() operation on that file will step on that freed
resource.

>
>
> > }
> >
> > WARN_ON_ONCE(on->nr_mmapped || on->nr_to_release);
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index 73f5c120def8..a7e404ff31bb 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -273,6 +273,11 @@ struct kernfs_ops {
> > */
> > int (*open)(struct kernfs_open_file *of);
> > void (*release)(struct kernfs_open_file *of);
> > + /*
> > + * Free resources tied to the lifecycle of the file, like a
> > + * waitqueue used for polling.
> > + */
> > + void (*free)(struct kernfs_open_file *of);
>
> I agree with Tejun, this needs to be documented much better and show how
> you really should never need to use this :)

I'll be happy not to use it if we can fix the release() to be called
only when the file is truly going away.
Thanks,
Suren.

>
> thanks,
>
> greg k-h