Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

From: Steven Rostedt
Date: Fri Aug 16 2019 - 13:05:09 EST


On Fri, 16 Aug 2019 17:48:59 +0100
Valentin Schneider <valentin.schneider@xxxxxxx> wrote:

> On 16/08/2019 17:25, Steven Rostedt wrote:
> >> Also, write and read to/from those variables should be done with
> >> WRITE_ONCE() and READ_ONCE(), given that those are read within tracing
> >> probes without holding the sched_register_mutex.
> >>
> >
> > I understand the READ_ONCE() but is the WRITE_ONCE() truly necessary?
> > It's done while holding the mutex. It's not that critical of a path,
> > and makes the code look ugly.
> >
>
> I seem to recall something like locking primitives don't protect you from
> store tearing / invented stores, so if you can have concurrent readers
> using READ_ONCE(), there should be a WRITE_ONCE() on the writer side, even
> if it's done in a critical section.

But for this, it really doesn't matter. The READ_ONCE() is for going
from 0->1 or 1->0 any other change is the same as 1.

When we enable trace events, we start recording the tasks comms such
that we can possibly map them to the pids. When we disable trace
events, we stop recording the comms so that we don't overwrite the
cache when not needed. Note, if more than the max cache of tasks are
recorded during a session, we are likely to miss comms anyway.

Thinking about this more, the READ_ONCE() and WRTIE_ONCE() are not even
needed, because this is just a best effort anyway.

The only real fix was to move the check into the mutex protect area,
because that can cause a real bug if there was a race.

{
- bool sched_register = (!sched_cmdline_ref && !sched_tgid_ref);
+ bool sched_register;
+
mutex_lock(&sched_register_mutex);
+ sched_register = (!sched_cmdline_ref && !sched_tgid_ref);

Thus, I'd like to see a v2 of this patch without the READ_ONCE() or
WRITE_ONCE() added.

-- Steve