Re: [PATCH 6/6] fanotify: add current_user_instances node

From: Amir Goldstein
Date: Tue Jun 28 2022 - 11:37:46 EST


On Tue, Jun 28, 2022 at 5:25 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Tue, Jun 28, 2022 at 04:55:25PM +0300, Amir Goldstein wrote:
> > On Tue, Jun 28, 2022 at 3:56 PM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Tue 28-06-22 15:29:08, Amir Goldstein wrote:
> > > > On Tue, Jun 28, 2022 at 2:50 PM guowei du <duguoweisz@xxxxxxxxx> wrote:
> > > > >
> > > > > hi, Mr Kara, Mr Brauner,
> > > > >
> > > > > I want to know how many fanotify readers are monitoring the fs event.
> > > > > If userspace daemons monitoring all file system events are too many, maybe there will be an impact on performance.
> > > >
> > > > I want something else which is more than just the number of groups.
> > > >
> > > > I want to provide the admin the option to enumerate over all groups and
> > > > list their marks and blocked events.
> > >
> > > Listing all groups and marks makes sense to me. Often enough I was
> > > extracting this information from a crashdump :).
> > >
> > > Dumping of events may be a bit more challenging (especially as we'd need to
> > > format the events which has some non-trivial implications) so I'm not 100%
> > > convinced about that. I agree it might be useful but I'd have to see the
> > > implementation...
> > >
> >
> > I don't really care about the events.
> > I would like to list the tasks that are blocked on permission events
> > and the fanotify reader process that blocks them, so that it could be killed.
> >
> > Technically, it is enough to list the blocked task pids in fanotify_fdinfo().
> > But it is also low hanging to print the number of queued events
> > in fanotify_fdinfo() and inotify_fdinfo().
>
> That's always going to be racy, right? You might list the blocked tasks
> but it's impossible for userspace to ensure that the pids it parses
> still refer to the same processes by the time it tries to kill them.
>
> You would need an interface that allows you to kill specific blocked
> tasks or at least all blocked tasks. You could just make this an - ahem
> - ioctl on a suitable fanotify fd and somehow ensure that the task is
> actually the one you want to kill?

I don't want to kill the blocked tasks
I want to kill the permission event reader process that is blocking them
or abort the blocking group without terminating the process in some
technique similar to fuse connection abort.

It is an emergency button for admin when all users get blocked
from accessing files.

The problem with mandatory locks IMO was not the fact that they
could be used to DoS other users, but the fact that there was no
escape door for admin override.

Windows servers have mandatory file locks, but they also have
an escape door for admin override:
https://www.technipages.com/windows-how-to-release-file-lock.

fanotify could be used to DoS users and admin has no
good tools to cope with that now.

>
> If you can avoid adding a whole new /sys/kernel/fanotify/ interface
> that'd be quite nice for userspace, I think.

On the contrary. I think that user will like enumerating the groups
in /sys/kernel/fanotify/ better then enumerating all fds of all procs
looking for fanotify fds - the lsof method is not efficient and not
scalable when you have many thousands of tasks and just one blocker.

w.r.t races, it is possible that /sys/kernel/fanotify/ could be used
to acquire some sort of fanotify fd clones that can only be used for
ioctls and not for read/write.

An ioctl can return the number of blocked tasks and possibly
their pidfd's for further inspection.

And of course an ABORT or SHUTDOWN ioctl to cancel all
blocked permission events and stop queueing events.

The same fd clone could also be acquired by opening
/proc/<fanotify_proc>/fd/<fanotify_fd>
to perform ABORT in case killing the process does not
work because the process itself is blocked on IO.

Current fanotify is not immune against this sort of deadlocks
similar deadlocks are described in FUSE documentation
in the section explaining about connection abort:
https://www.kernel.org/doc/html/latest/filesystems/fuse.html#aborting-a-filesystem-connection

Thanks,
Amir.