Re: Upcoming: Notifications, FS notifications and fsinfo()

From: Lennart Poettering
Date: Tue Mar 31 2020 - 03:28:57 EST


On Mo, 30.03.20 23:17, Christian Brauner (christian.brauner@xxxxxxxxxx) wrote:

> Fwiw, putting down my kernel hat and speaking as someone who maintains
> two container runtimes and various other low-level bits and pieces in
> userspace who'd make heavy use of this stuff I would prefer the fd-based
> fsinfo() approach especially in the light of across namespace
> operations, querying all properties of a mount atomically all-at-once,
> and safe delegation through fds. Another heavy user of this would be
> systemd (Cced Lennart who I've discussed this with) which would prefer
> the fd-based approach as well. I think pulling this into a filesystem
> and making userspace parse around in a filesystem tree to query mount
> information is the wrong approach and will get messy pretty quickly
> especially in the face of mount and user namespace interactions and
> various other pitfalls. fsinfo() fits quite nicely with the all-fd-based
> approach of the whole mount api. So yes, definitely preferred from my
> end.

Christian is right. I think it's very important to have an API that
allows to query the state of fs attributes in a consistent state,
i.e. so that the attributes userspace is interested in can be queried
in a single call, so they all describe the very same point in
time. Distributing attributes onto multiple individual files just
sucks, because it's then guaranteed that you never can read them in a
way they actually fit together, some attributes you read will be
older, others newer. It's a big design flaw of sysfs (which is
structured like this) if you ask me.

I don't really care if the kernel API for this is binary or
textual. Slight preference for binary, but I don't care too much.

I think it would be wise to bind such APIs to fds, simply because it
always works. Doing path based stuff sucks, because you always need to
mount stuff and have a path tree set up, which is less ideal in a
world where namespacing is common, and namespaces are a shared concept
(at least with your other threads, if not with other processes). As a
maintainer of an init system I really dislike APIs that I can only use
after a mount structure has been set up, too often we want to do stuff
before that. Moreover, philosophically I find it questionnable to use
path based APIs to interface with the path object hierarchy itself. it
feels "too recursive". Just keep this separate: build stuff on top of
the fs that fits on top of the fs, but don't build fs APIs on top of
fs APIs that stem from the same layer.

Summary: atomic APIs rock, fd-based APIs rock. APIs built on
individual files one can only read individually suck. APIs of the path
layer exposed in the path layer suck.

Hope this makes some sense?

Lennart