Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

From: Christian Brauner
Date: Mon Jan 08 2024 - 06:05:39 EST


> > * Tracefs supports the creation of instances from userspace via mkdir.
> > For example,
> >
> > mkdir /sys/kernel/tracing/instances/foo
> >
> > And here the idmapping is relevant so we need to make the helpers
> > aware of the idmapping.
> >
> > I just went and plumbed this through to most helpers.
> >
> > There's some subtlety in eventfs. Afaict, the directories and files for
> > the individual events are created on-demand during lookup or readdir.
> >
> > The ownership of these events is again inherited from the parent inode
> > or recovered from stored state. In both cases the actual idmapping is
> > irrelevant.
> >
> > The callchain here is:
> >
> > eventfs_root_lookup("xfs", "events")
> > -> create_{dir,file}_dentry("xfs", "events")
> > -> create_{dir,file}("xfs", "events")
> > -> eventfs_start_creating("xfs", "events")
> > -> lookup_one_len("xfs", "events")
> >
> > And the subtlety is that lookup_one_len() does permission checking on
> > the parent inode (IOW, if you want a dentry for "blech" under "events"
> > it'll do a permission check on events->d_inode) for exec permissions
> > and then goes on to give you a new dentry.
> >
> > Usually this call would have to be changed to lookup_one() and the
> > idmapping be handed down to it. But I think that's irrelevant here.
> >
> > Lookup generally doesn't need to be aware of idmappings at all. The
> > permission checking is done purely in the vfs via may_lookup() and the
> > idmapping is irrelevant because we always initialize inodes with the
> > filesystem level ownership (see the idmappings.rst) documentation if
> > you're interested in excessive details (otherwise you get inode aliases
> > which you really don't want).
> >
> > For tracefs it would not matter for lookup per se but only because
> > tracefs seemingly creates inodes/dentries during lookup (and readdir()).
>
> tracefs creates the inodes/dentries at boot up, it's eventfs that does
> it on demand during lookup.
>
> For inodes/dentries:
>
> /sys/kernel/tracing/* is all created at boot up, except for "events".

Yes.

> /sys/kernel/tracing/events/* is created on demand.

Yes.

>
> >
> > But imho the permission checking done in current eventfs_root_lookup()
> > via lookup_one_len() is meaningless in any way; possibly even
> > (conceptually) wrong.
> >
> > Because, the actual permission checking for the creation of the eventfs
> > entries isn't really done during lookup or readdir, it's done when mkdir
> > is called:
> >
> > mkdir /sys/kernel/tracing/instances/foo
>
> No. that creates a entire new tracefs instance, which happens to
> include another eventfs directory.

Yes, I'm aware of all that.

> No. Only the meta data is created for the eventfs directory with a
> mkdir instances/foo. The inodes and dentries are not there.

I know, that is what I'm saying.

>
> >
> > When one goes and looksup stuff under foo/events/ or readdir the entries
> > in that directory:
> >
> > fd = open("foo/events")
> > readdir(fd, ...)
> >
> > then they are licensed to list an entry in that directory. So all that
> > needs to be done is to actually list those files in that directory. And
> > since they already exist (they were created during mkdir) we just need
> > to splice in inodes and dentries for them. But for that we shouldn't
> > check permissions on the directory again. Because we've done that
> > already correctly when the VFS called may_lookup().
>
> No they do not exist.

I am aware.

>
> >
> > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > redundant and just wrong.
>
> I don't think so.

I'm very well aware that the dentries and inode aren't created during
mkdir but the completely directory layout is determined. You're just
splicing in dentries and inodes during lookup and readdir.

If mkdir /sys/kernel/tracing/instances/foo has succeeded and you later
do a lookup/readdir on

ls -al /sys/kernel/tracing/instances/foo/events

Why should the creation of the dentries and inodes ever fail due to a
permission failure? The vfs did already verify that you had the required
permissions to list entries in that directory. Why should filling up
/sys/kernel/tracing/instances/foo/events ever fail then? It shouldn't
That tracefs instance would be half-functional. And again, right now
that inode_permission() check cannot even fail.