Re: [RFC v2 1/4] fs: Add generic file system event notifications

From: Jan Kara
Date: Wed Apr 29 2015 - 03:43:14 EST


On Wed 29-04-15 09:03:08, Beata Michalska wrote:
> On 04/28/2015 07:39 PM, Greg KH wrote:
> > On Tue, Apr 28, 2015 at 04:46:46PM +0200, Beata Michalska wrote:
> >> On 04/28/2015 04:09 PM, Greg KH wrote:
> >>> On Tue, Apr 28, 2015 at 03:56:53PM +0200, Jan Kara wrote:
> >>>> On Mon 27-04-15 17:37:11, Greg KH wrote:
> >>>>> On Mon, Apr 27, 2015 at 05:08:27PM +0200, Beata Michalska wrote:
> >>>>>> On 04/27/2015 04:24 PM, Greg KH wrote:
> >>>>>>> On Mon, Apr 27, 2015 at 01:51:41PM +0200, Beata Michalska wrote:
> >>>>>>>> Introduce configurable generic interface for file
> >>>>>>>> system-wide event notifications, to provide file
> >>>>>>>> systems with a common way of reporting any potential
> >>>>>>>> issues as they emerge.
> >>>>>>>>
> >>>>>>>> The notifications are to be issued through generic
> >>>>>>>> netlink interface by newly introduced multicast group.
> >>>>>>>>
> >>>>>>>> Threshold notifications have been included, allowing
> >>>>>>>> triggering an event whenever the amount of free space drops
> >>>>>>>> below a certain level - or levels to be more precise as two
> >>>>>>>> of them are being supported: the lower and the upper range.
> >>>>>>>> The notifications work both ways: once the threshold level
> >>>>>>>> has been reached, an event shall be generated whenever
> >>>>>>>> the number of available blocks goes up again re-activating
> >>>>>>>> the threshold.
> >>>>>>>>
> >>>>>>>> The interface has been exposed through a vfs. Once mounted,
> >>>>>>>> it serves as an entry point for the set-up where one can
> >>>>>>>> register for particular file system events.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Beata Michalska <b.michalska@xxxxxxxxxxx>
> >>>>>>>> ---
> >>>>>>>> Documentation/filesystems/events.txt | 231 ++++++++++
> >>>>>>>> fs/Makefile | 1 +
> >>>>>>>> fs/events/Makefile | 6 +
> >>>>>>>> fs/events/fs_event.c | 770 ++++++++++++++++++++++++++++++++++
> >>>>>>>> fs/events/fs_event.h | 25 ++
> >>>>>>>> fs/events/fs_event_netlink.c | 99 +++++
> >>>>>>>> fs/namespace.c | 1 +
> >>>>>>>> include/linux/fs.h | 6 +-
> >>>>>>>> include/linux/fs_event.h | 58 +++
> >>>>>>>> include/uapi/linux/fs_event.h | 54 +++
> >>>>>>>> include/uapi/linux/genetlink.h | 1 +
> >>>>>>>> net/netlink/genetlink.c | 7 +-
> >>>>>>>> 12 files changed, 1257 insertions(+), 2 deletions(-)
> >>>>>>>> create mode 100644 Documentation/filesystems/events.txt
> >>>>>>>> create mode 100644 fs/events/Makefile
> >>>>>>>> create mode 100644 fs/events/fs_event.c
> >>>>>>>> create mode 100644 fs/events/fs_event.h
> >>>>>>>> create mode 100644 fs/events/fs_event_netlink.c
> >>>>>>>> create mode 100644 include/linux/fs_event.h
> >>>>>>>> create mode 100644 include/uapi/linux/fs_event.h
> >>>>>>>
> >>>>>>> Any reason why you just don't do uevents for the block devices today,
> >>>>>>> and not create a new type of netlink message and userspace tool required
> >>>>>>> to read these?
> >>>>>>
> >>>>>> The idea here is to have support for filesystems with no backing device as well.
> >>>>>> Parsing the message with libnl is really simple and requires few lines of code
> >>>>>> (sample application has been presented in the initial version of this RFC)
> >>>>>
> >>>>> I'm not saying it's not "simple" to parse, just that now you are doing
> >>>>> something that requires a different tool. If you have a block device,
> >>>>> you should be able to emit uevents for it, you don't need a backing
> >>>>> device, we handle virtual filesystems in /sys/block/ just fine :)
> >>>>>
> >>>>> People already have tools that listen to libudev for system monitoring
> >>>>> and management, why require them to hook up to yet-another-library? And
> >>>>> what is going to provide the ability for multiple userspace tools to
> >>>>> listen to these netlink messages in case you have more than one program
> >>>>> that wants to watch for these things (i.e. multiple desktop filesystem
> >>>>> monitoring tools, system-health checkers, etc.)?
> >>>> As much as I understand your concerns I'm not convinced uevent interface
> >>>> is a good fit. There are filesystems that don't have underlying block
> >>>> device - think of e.g. tmpfs or filesystems working directly on top of
> >>>> flash devices. These still want to send notification to userspace (one of
> >>>> primary motivation for this interfaces was so that tmpfs can notify about
> >>>> something). And creating some fake nodes in /sys/block for tmpfs and
> >>>> similar filesystems seems like doing more harm than good to me...
> >>>
> >>> If these are "fake" block devices, what's going to be present in the
> >>> block major/minor fields of the netlink message? For some reason I
> >>> thought it was a required field, and because of that, I thought we had a
> >>> "real" filesystem somewhere to refer to, otherwise how would userspace
> >>> know what filesystem was creating these events?
> >>>
> >>> What am I missing here?
> >>>
> >>> confused,
> >>>
> >>> greg k-h
> >>>
> >>
> >> For those 'fake' block devs, upon mount, get_anon_bdev will assign
> >> the major:minor numbers. Userspace might get those through stat.
> >
> > How can userspace do the mapping backwards from this "anonymous"
> > major:minor number for these types of filesystems in such a way that
> > they can "know" how to report the block device that is causing the
> > event?
> >
> > thanks,
> >
> > greg k-h
> >
>
> It needs to be done internally by the app but is doable.
> The app knows what it is watching, so it can maintain the mappings.
> So prior to activating the notifications it can call 'stat' on the mount point.
> Stat struct gives the 'st_dev' which is the device id. Same will be reported
> within the message payload (through major:minor numbers). So having this,
> the app is able to get any other information it needs.
> Note that the events refer to the file system as a whole and they may not
> necessarily have anything to do with the actual block device.
Or you can use /proc/self/mountinfo for the mapping. There you can see
device numbers, real device names if applicable and mountpoints. This has
the advantage that it works even if filesystem mountpoints change.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/