Re: [for-linus][PATCH 1/3] eventfs: Have the inodes all for files and directories all be the same

From: Steven Rostedt
Date: Mon Jan 22 2024 - 10:36:38 EST


On Mon, 22 Jan 2024 11:38:52 +0100
Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:

> Hi Stephen,

I don't know who "Stephen" is, but I'll reply to this message.

>
> On Wed, Jan 17, 2024 at 3:37 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
> >
> > The dentries and inodes are created in the readdir for the sole purpose of
> > getting a consistent inode number. Linus stated that is unnecessary, and
> > that all inodes can have the same inode number. For a virtual file system
> > they are pretty meaningless.
> >
> > Instead use a single unique inode number for all files and one for all
> > directories.
> >
> > Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.localhome/

Yeah, Linus wanted me to try this first and see if there's any regressions.
Well, I guess you just answered that.

The above link has me saying to Linus:

It was me being paranoid that using the same inode number would break user
space. If that is not a concern, then I'm happy to just make it either the
same, or maybe just hash the ei and name that it is associated with.

> > Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@xxxxxxxxxxx
> >
> > Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> > Cc: Mark Rutland <mark.rutland@xxxxxxx>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> > Cc: Christian Brauner <brauner@xxxxxxxxxx>
> > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> > Cc: Ajay Kaher <ajay.kaher@xxxxxxxxxxxx>
> > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
>
> Thanks for your patch, which is now commit 53c41052ba312176 ("eventfs:
> Have the inodes all for files and directories all be the same") in
> v6.8-rc1, to which I have bisected the issue below.
>
> > --- a/fs/tracefs/event_inode.c
> > +++ b/fs/tracefs/event_inode.c
> > @@ -32,6 +32,10 @@
> > */
> > static DEFINE_MUTEX(eventfs_mutex);
> >
> > +/* Choose something "unique" ;-) */
> > +#define EVENTFS_FILE_INODE_INO 0x12c4e37
> > +#define EVENTFS_DIR_INODE_INO 0x134b2f5
> > +
> > /*
> > * The eventfs_inode (ei) itself is protected by SRCU. It is released from
> > * its parent's list and will have is_freed set (under eventfs_mutex).
> > @@ -352,6 +356,9 @@ static struct dentry *create_file(const char *name, umode_t mode,
> > inode->i_fop = fop;
> > inode->i_private = data;
> >
> > + /* All files will have the same inode number */
> > + inode->i_ino = EVENTFS_FILE_INODE_INO;
> > +
> > ti = get_tracefs(inode);
> > ti->flags |= TRACEFS_EVENT_INODE;
> > d_instantiate(dentry, inode);
> > @@ -388,6 +395,9 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
> > inode->i_op = &eventfs_root_dir_inode_operations;
> > inode->i_fop = &eventfs_file_operations;
> >
> > + /* All directories will have the same inode number */
> > + inode->i_ino = EVENTFS_DIR_INODE_INO;
> > +
> > ti = get_tracefs(inode);
> > ti->flags |= TRACEFS_EVENT_INODE;
>
> This confuses "find".
> Running "find /sys/" now prints lots of error messages to stderr:
>
> find: File system loop detected;
> ‘/sys/kernel/debug/tracing/events/initcall/initcall_finish’ is part of
> the same file system loop as
> ‘/sys/kernel/debug/tracing/events/initcall’.

So at a minimum, the directories need to have unique inode numbers.


> find: File system loop detected;
> ‘/sys/kernel/debug/tracing/events/initcall/initcall_start’ is part of
> the same file system loop as
> ‘/sys/kernel/debug/tracing/events/initcall’.
> find: File system loop detected;
> ‘/sys/kernel/debug/tracing/events/initcall/initcall_level’ is part of
> the same file system loop as
> ‘/sys/kernel/debug/tracing/events/initcall’.
> [...]

Does this fix it for you? It hashes the eventfs_inode data structure after
adding some salt to it.

Kees,

I'm using the eventfs_inode pointer to create a unique value for the inode.
But it's being salted, hashed and then truncated. As it is very easy to
read inodes (although by default, only root has access to read these
inodes), the inode numbers themselves shouldn't be able to leak kernel
addresses via the results of these inode numbers, would it?

-- Steve

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 6795fda2af19..d54897b84596 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -19,6 +19,7 @@
#include <linux/namei.h>
#include <linux/workqueue.h>
#include <linux/security.h>
+#include <linux/siphash.h>
#include <linux/tracefs.h>
#include <linux/kref.h>
#include <linux/delay.h>
@@ -36,6 +37,31 @@ static DEFINE_MUTEX(eventfs_mutex);
#define EVENTFS_FILE_INODE_INO 0x12c4e37
#define EVENTFS_DIR_INODE_INO 0x134b2f5

+/* Used for making inode numbers */
+static siphash_key_t inode_key;
+
+/* Copied from scripts/kconfig/symbol.c */
+static unsigned strhash(const char *s)
+{
+ /* fnv32 hash */
+ unsigned hash = 2166136261U;
+ for (; *s; s++)
+ hash = (hash ^ *s) * 0x01000193;
+ return hash;
+}
+
+/* Just try to make something consistent and unique */
+static int eventfs_dir_ino(struct event_inode *ei, const char *name)
+{
+ unsigned long sip = (unsigned long)ei;
+
+ sip += strhash(name) + EVENTFS_DIR_INODE_INO;
+ sip = siphash_1u32((int)sip, &inode_key);
+
+ /* keep it positive */
+ return sip & ((1U << 31) - 1);
+}
+
/*
* The eventfs_inode (ei) itself is protected by SRCU. It is released from
* its parent's list and will have is_freed set (under eventfs_mutex).
@@ -396,7 +422,7 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
inode->i_fop = &eventfs_file_operations;

/* All directories will have the same inode number */
- inode->i_ino = EVENTFS_DIR_INODE_INO;
+ inode->i_ino = eventfs_dir_ino(ei, ei->name);

ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
@@ -802,7 +828,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)

name = ei_child->name;

- ino = EVENTFS_DIR_INODE_INO;
+ ino = eventfs_dir_ino(ei_child, name);

if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR))
goto out_dec;
@@ -932,6 +958,9 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
if (IS_ERR(dentry))
return ERR_CAST(dentry);

+ if (siphash_key_is_zero(&inode_key))
+ get_random_bytes(&inode_key, sizeof(inode_key));
+
ei = kzalloc(sizeof(*ei), GFP_KERNEL);
if (!ei)
goto fail_ei;