Re: [linus:master] [eventfs] 852e46e239: BUG:unable_to_handle_page_fault_for_address

From: Linus Torvalds
Date: Mon Jan 29 2024 - 12:40:35 EST


On Mon, 29 Jan 2024 at 09:01, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> Thanks for the analysis. I have a patch that removes the dropping of the
> mutex over the create_dir/file() calls, and lockdep hasn't complained about
> it.
>
> I was going to add that to my queue for the next merge window along with
> other clean ups but this looks like it actually fixes a real bug. I'll move
> that over to my urgent queue and start testing it.

Note that it is *not* enough to just keep the mutex.

All the *users* need to get the mutex too.

Otherwise you have this:

CPU1:

create_dir_dentry():
mutex locked the whole time..
dentry = create_dir(ei, parent);
does d_instantiate(dentry, inode);
eventfs_post_create_dir(ei);
dentry->d_fsdata = ei;
mutex dropped;

but CPU2 can still come in, see the dentry immediately after the
"d_instantiate()", and do an "open()" or "stat()" on the dentry (which
will *not* cause a 'lookup()', since it's in the dentry cache), and
that will then cause either

->permission() -> eventfs_permission() -> set_top_events_ownership()

or

->get_attr() -> eventfs_get_attr() -> set_top_events_ownership()

and both of those will now do the dentry->inode->ei lookup. And
neither of them takes the mutex.

So then it doesn't even matter that you didn't drop the mutex in the
middle, because the users simply won't be serializing with it anyway.

So you'd have to make set_top_events_ownership() take the mutex around
it all too.

In fact, pretty much *any* use of "ti->private" needs the mutex.

Which is obviously a bit painful.

Honestly, I think the right model is to just make sure that the inode
is fully initialized when you do 'd_instantiate()'

The patch looks obvious, and I think this actually fixes *another*
bug, namely that the old

ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;

was buggy, because 'ti->flags' was uninitialized before.

eventfs_create_events_dir() seems to have the same bug with ti->flags,
btw, but got the ti->private initialization right.

Funnily enough, create_file() got this right. I don't even understand
why create_dir() did what it did.

IOW, I think the right fix is really just this:

--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -328,7 +328,8 @@
inode->i_ino = EVENTFS_FILE_INODE_INO;

ti = get_tracefs(inode);
- ti->flags |= TRACEFS_EVENT_INODE;
+ ti->flags = TRACEFS_EVENT_INODE;
+ ti->private = ei;
d_instantiate(dentry, inode);
fsnotify_create(dentry->d_parent->d_inode, dentry);
return eventfs_end_creating(dentry);
@@ -513,7 +514,6 @@
static void eventfs_post_create_dir(struct eventfs_inode *ei)
{
struct eventfs_inode *ei_child;
- struct tracefs_inode *ti;

lockdep_assert_held(&eventfs_mutex);

@@ -523,9 +523,6 @@
srcu_read_lock_held(&eventfs_srcu)) {
ei_child->d_parent = ei->dentry;
}
-
- ti = get_tracefs(ei->dentry->d_inode);
- ti->private = ei;
}

/**
@@ -943,7 +940,7 @@
INIT_LIST_HEAD(&ei->list);

ti = get_tracefs(inode);
- ti->flags |= TRACEFS_EVENT_INODE | TRACEFS_EVENT_TOP_INODE;
+ ti->flags = TRACEFS_EVENT_INODE | TRACEFS_EVENT_TOP_INODE;
ti->private = ei;

inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;

which fixes the initialization errors with the 'ti' fields.

Now, I do agree that your locking is strange, and that should be fixed
*too*, but I think the above is the "right" fix for this particular
issue.

Linus