Re: [PATCH] tracing/user_events: Run BPF program if attached

From: Linus Torvalds
Date: Wed May 17 2023 - 14:15:40 EST


On Wed, May 17, 2023 at 10:22 AM Beau Belgrave
<beaub@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, May 16, 2023 at 08:03:09PM -0700, Linus Torvalds wrote:
> > So what is it that could even race and change the list that is the
> > cause of that rcu-ness?
>
> Processes that fork() with previous user_events need to be duplicated.

BS.

Really. Stop making stuff up.

The above statement is clearly not true - just LOOK AT THE CODE.

Here's the loop in question:

list_for_each_entry_rcu(enabler, &mm->enablers, link) {
if (enabler->event == user) {
attempt = 0;
user_event_enabler_write(mm, enabler,
true, &attempt);
}
}

and AT THE VERY TOP OF user_event_enabler_write() we have this:

lockdep_assert_held(&event_mutex);

so either nobody has ever tested this code with lockdep enabled, or we
hold that lock.

And if nobody has ever tested the code, then it's broken anyway. That
code N#EEDS the mutex lock. It needs to stop thinking it's RCU-safe,
when it clearly isn't.

So I ask again: why is that code using RCU list traversal, when it
already holds the lock that makes the RCU'ness COMPLETELY POINTLESS.

And again, that pointless RCU locking around this all seems to be the
*only* reason for all these issues with pin_user_pages_remote().

So I claim that this code is garbage. Somebody didn't think about locking.

Now, it's true that during fork, we have *another* RCU loop, but that
one is harmless: that's not the one that does all this page pinning.

Now, that one *does* do

list_add_rcu(&enabler->link, &mm->enablers);

without actually holding any locks, but in this case 'mm' is a newly
allocated private thing of a task that hasn't even been exposed to the
world yet, so nobody should be able to even see it. So that code lacks
the proper locking for the new list, but it does so because there is
nothing that can race with the new list (and the old list is
read-only, so RCU traversal of the old list works).

So that "list_add_rcu()" there could probably be just a "list_add()",
with a comment saying "this is new, nobody can see it".

And if something *can* race it it and can see the new list, then it
had damn well needs that mutex lock anyway, because that "something"
could be actually modifying it. But that's separate from the page
pinning situation.

So again, I claim that the RCU'ness of the pin_user_pages part is
broken and should simply not exist.

> > Other code in that file happily just does
> >
> > mutex_lock(&event_mutex);
> >
> > list_for_each_entry_safe(enabler, next, &mm->enablers, link)
> >
> > with no RCU anywhere. Why does user_event_enabler_update() not do that?
>
> This is due to the fork() case above without taking the event_mutex.

See above. Your thinking is confused, and the code is broken.

If somebody can see the new list while it is created during fork(),
then you need the event_mutex to protect the creation of it.

And if nobody can see it, then you don't need any RCU protection against it.

Those are the two choices. You can't have it both ways.

> > Oh, and even those other loops are a bit strange. Why do they use the
> > "_safe" variant, even when they just traverse the list without
> > changing it? Look at user_event_enabler_exists(), for example.
>
> The other places in the code that do this either will remove the event
> depending on the situation during the for_each, or they only hold the
> register lock and don't hold the event_mutex.

So?

That "safe" variant doesn't imply any locking. It does *not* protect
against events being removed. It *purely* protects against the loop
itself removing entries.

So this code:

list_for_each_entry_safe(enabler, next, &mm->enablers, link) {
if (enabler->addr == uaddr &&
(enabler->values & ENABLE_VAL_BIT_MASK) == bit)
return true;
}

is simply nonsensical. There is no reason for the "safe". It does not
make anything safer.

The above loop is only safe under the mutex (it would need to be the
"rcu" variant to be safe to traverse without locking), and since it
isn't modifying the list, there's no reason for the safe.

End result: the "safe" part is simply wrong.

If the intention is "rcu" because of lack of locking, then the code needs to
(a) get the rcu read lock
(b) use the _rcu variant of the list traversal

And if the intention is that it's done under the proper 'event_mutex'
lock, then the "safe" part should simply be dropped.

Linus