Re: [PATCH] tracefs/eventfs: Use root and instance inodes as default ownership

From: Christian Brauner
Date: Wed Jan 10 2024 - 06:51:53 EST


On Mon, Jan 08, 2024 at 10:23:31AM -0500, Steven Rostedt wrote:
> On Mon, 8 Jan 2024 12:04:54 +0100
> Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> > > > IOW, the inode_permission() in lookup_one_len() that eventfs does is
> > > > redundant and just wrong.
> > >
> > > I don't think so.
> >
> > I'm very well aware that the dentries and inode aren't created during
> > mkdir but the completely directory layout is determined. You're just
> > splicing in dentries and inodes during lookup and readdir.
> >
> > If mkdir /sys/kernel/tracing/instances/foo has succeeded and you later
> > do a lookup/readdir on
> >
> > ls -al /sys/kernel/tracing/instances/foo/events
> >
> > Why should the creation of the dentries and inodes ever fail due to a
> > permission failure?
>
> They shouldn't.
>
> > The vfs did already verify that you had the required
> > permissions to list entries in that directory. Why should filling up
> > /sys/kernel/tracing/instances/foo/events ever fail then? It shouldn't
> > That tracefs instance would be half-functional. And again, right now
> > that inode_permission() check cannot even fail.
>
> And it shouldn't. But without dentries and inodes, how does VFS know what
> is allowed to open the files?

So say you do:

mkdir /sys/kernel/tracing/instances/foo

After this has returned we know everything we need to know about the new
tracefs instance including the ownership and the mode of all inodes in
/sys/kernel/tracing/instances/foo/events/* and below precisely because
ownership is always inherited from the parent dentry and recorded in the
metadata struct eventfs_inode.

So say someone does:

open("/sys/kernel/tracing/instances/foo/events/xfs");

and say this is the first time that someone accesses that events/
directory.

When the open pathwalk is done, the vfs will determine via

[1] may_lookup(inode_of(events))

whether you are able to list entries such as "xfs" in that directory.
The vfs checks inode_permission(MAY_EXEC) on "events" and if that holds
it ends up calling i_op->eventfs_root_lookup(events).

At this point tracefs/eventfs adds the inodes for all entries in that
"events" directory including "xfs" based on the metadata it recorded
during the mkdir. Since now someone is actually interested in them. And
it initializes the inodes with ownership and everything and adds the
dentries that belong into that directory.

Nothing here depends on the permissions of the caller. The only
permission that mattered was done in the VFS in [1]. If the caller has
permissions to enter a directory they can lookup and list its contents.
And its contents where determined/fixed etc when mkdir was called.

So we just need to add the required objects into the caches (inode,
dentry) whose addition we intentionally defered until someone actually
needed them.

So, eventfs_root_lookup() now initializes the inodes with the ownership
from the stored metadata or from the parent dentry and splices in inodes
and dentries. No permission checking is needed for this because it is
always a recheck of what the vfs did in [1].

We now return to the vfs and path walk continues to the final component
that you actually want to open which is that "xfs" directory in this
example. We check the permissions on that inode via may_open("xfs") and
we open that directory returning an fd to userspace ultimately.

(I'm going by memory since I need to step out the door.)