Re: [PATCH 1/2] docs: rcu: Add cautionary note on plain-accesses to requirements

From: Paul E. McKenney
Date: Fri Aug 04 2023 - 13:27:47 EST

On Sat, Aug 05, 2023 at 12:33:03AM +0800, Alan Huang wrote:
> >> Yes, a write-write data race where the value is the same is also fine.
> >>
> >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race),
> >> we still need WRITE_ONCE() in these cases, though it’s semantically safe.
> >>
> >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic.
> >
> > Volatile accesses are not specified very well in the standard. However,
> > as a practical matter, compilers that wish to be able to device drivers
> > (whether in kernels or userspace applications) must compile those volatile
> > accesses in such a way to allow reliable device drivers to be written.
> >
> >> However, the kernel consider the volatile access to be atomic, right?
> >
> > The compiler must therefore act as if a volatile access to an aligned
> > machine-word size location is atomic. To see this, consider accesses
> > to memory that is shared by a device driver and that device's firmware,
> > both of which are written in either C or C++.
> I learned these things a few months ago. But still thank you!
> The real problem is that there may be a data race at line 5, so Joel is correct that the compiler
> can cache the value loaded from line 5 according to the standard given that the standard says that
> a data race result in undefined behavior, so the compiler might be allowed to do anything. But from the
> perspective of the kernel, line 5 is likely a diagnostic read, so it’s fine without READ_ONCE() and the
> compiler is not allowed to cache the value.
> This situation is like the volatile access.
> Am I missing something?

I think you have it right.

The point is that we are sometimes more concerned about focusing KCSAN
diagnostics on the core concurrent algorithm, and are willing to take
the very low risk of messed-up diagnostic output in order to get simpler
and better KCSAN diagnostics on the main algorithm.

So in that case, we use data_race() on the diagnostics and other markings
in the main algorithm.

For example, suppose that we had a core algorithm that relied on strict
locking. In that case, we want to use unmarked plain C-language accesses
in the core algorithm, which will allow KCSAN to flag and accesses that
are not protected by the lock. But it might be bad for the diagnostic
code to acquire that lock, as this would suppress diagnostics in the case
where the lock was held for too long a time period. Using data_race()
in the diagnostic code addresses this situation.

Thanx, Paul

> > Does that help?
> >
> > Thanx, Paul
> >
> >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel.
> >>
> >>> result will not change the semantics of the program. But otherwise,
> >>> that's bad.
> >>>
> >>> [1]
> >>>
> >>> thanks,
> >>>
> >>> - Joel
> >>>
> >>>
> >>>>
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory
> >>>>>>> +location, in code involved in a data race.
> >>>>>>> +
> >>>>>>> In short, updaters use rcu_assign_pointer() and readers use
> >>>>>>> rcu_dereference(), and these two RCU API elements work together to
> >>>>>>> ensure that readers have a consistent view of newly added data elements.
> >>>>>>> --
> >>>>>>>