Re: [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file

From: Suren Baghdasaryan
Date: Tue Jun 27 2023 - 17:59:19 EST


On Tue, Jun 27, 2023 at 2:43 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Tue, Jun 27, 2023 at 1:09 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> >
> > On Tue, Jun 27, 2023 at 11:42 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
> > >
> > > Hello, Christian.
> > >
> > > On Tue, Jun 27, 2023 at 07:30:26PM +0200, Christian Brauner wrote:
> > > ...
> > > > ->release() was added in
> > > >
> > > > commit 0e67db2f9fe91937e798e3d7d22c50a8438187e1
> > > > kernfs: add kernfs_ops->open/release() callbacks
> > > >
> > > > Add ->open/release() methods to kernfs_ops. ->open() is called when
> > > > the file is opened and ->release() when the file is either released or
> > > > severed. These callbacks can be used, for example, to manage
> > > > persistent caching objects over multiple seq_file iterations.
> > > >
> > > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> > > > Acked-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > > Acked-by: Acked-by: Zefan Li <lizefan@xxxxxxxxxx>
> > > >
> > > > which mentions "either releases or severed" which imho already points to
> > > > separate methods.
> > >
> > > This is because kernfs has revoking operation which doesn't exist for other
> > > filesystems. Other filesystem implemenations can't just say "I'm done. Bye!"
> > > and go away. Even if the underlying filesystem has completely failed, the
> > > code still has to remain attached and keep aborting operations.
> > >
> > > However, kernfs serves as the midlayer to a lot of device drivers and other
> > > internal subsystems and it'd be really inconvenient for each of them to have
> > > to implement "I want to go away but I gotta wait out this user who's holding
> > > onto my tuning knob file". So, kernfs exposes a revoke or severing semantics
> > > something that's exposing interface through kernfs wants to stop doing so.
> > >
> > > If you look at it from file operation implementation POV, this seems exactly
> > > like ->release. All open files are shutdown and there won't be any future
> > > operations. After all, revoke is forced closing of all fd's. So, for most
> > > users, treating severing just like ->release is the right thing to do.
> > >
> > > The PSI file which caused this is a special case because it attaches
> > > something to its kernfs file which outlives the severing operation bypassing
> > > kernfs infra. A more complete way to fix this would be supporting the
> > > required behavior from kernfs side, so that the PSI file operates on kernfs
> > > interface which knows the severing event and detaches properly. That said,
> > > currently, this is very much an one-off.
> > >
> > > Suren, if you're interested, it might make sense to pipe poll through kernfs
> > > properly so that it has its kernfs operation and kernfs can sever it. That
> > > said, as this is a fix for something which is currently causing crashes,
> > > it'd be better to merge this simpler fix first no matter what.
> >
> > I'm happy to implement the right fix if you go into more details.
> > AFAIKT kernfs_ops already has poll() operation, we are hooking
> > cgroup_file_poll() to it and using kernfs_generic_poll(). I thought
> > this is the right way to pipe poll through kernfs but if that's
> > incorrect, please let me know. I'm happy to fix that.
>
> Ah, sorry, for PSI we are not using kernfs_generic_poll(), so my claim
> misrepresents the situation. Let me look into how
> kernfs_generic_poll() is implemented and maybe I can find a better
> solution for PSI.

Ok in kernfs_generic_poll() we are using kernfs_open_node.poll
waitqueue head for polling and kernfs_open_node is freed from inside
kernfs_unlink_open_file() which is called from kernfs_fop_release().
So, it is destroyed only when the last fput() is done, unlike the
ops->release() operation which we are using for destroying PSI
trigger's waitqueue. So, it seems we still need an operation which
would indicate that the file is truly going away.

Christian's suggestion to rename current ops->release() operation into
ops->drain() (or ops->flush() per Matthew's request) and introduce a
"new" ops->release() which is called only when the last fput() is done
seems sane to me. Would everyone be happy with that approach?


> Thanks,
> Suren.
>
> > Thanks,
> > Suren.
> >
> > >
> > > Thanks.
> >
> > >
> > > --
> > > tejun