Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next

From: Marco Elver
Date: Thu Jan 30 2020 - 08:39:53 EST


On Wed, 29 Jan 2020 at 19:40, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, Jan 29, 2020 at 04:29:43PM +0100, Marco Elver wrote:
>
> > On Tue, 28 Jan 2020 at 17:52, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > I'm claiming that in the first case, the only thing that's ever done
> > > with a racy load is comparing against 0, there is no possible bad
> > > outcome ever. While obviously if you let the load escape, or do anything
> > > other than compare against 0, there is.
> >
> > It might sound like a simple rule, but implementing this is anything
> > but simple: This would require changing the compiler,
>
> Right.
>
> > which we said we'd like to avoid as it introduces new problems.
>
> Ah, I missed that brief.
>
> > This particular rule relies on semantic analysis that is beyond what
> > the TSAN instrumentation currently supports. Right now we support GCC
> > and Clang; changing the compiler probably means we'd end up with only
> > one (probably Clang), and many more years before the change has
> > propagated to the majority of used compiler versions. It'd be good if
> > we can do this purely as a change in the kernel's codebase.
>
> *sigh*, I didn't know there was such a resistance to change the tooling.
> That seems very unfortunate :-/

Unfortunately. Just wanted to highlight what to expect if we go down
that path. We can put it on a nice-to-have list, but don't expect or
rely on it to happen soon, given the implications above.

> > Keeping the bigger picture in mind, how frequent is this case, and
> > what are we really trying to accomplish?
>
> It's trying to avoid the RmW pulling the line in exclusive/modified
> state in a loop. The basic C-CAS pattern if you will.
>
> > Is it only to avoid a READ_ONCE? Why is the READ_ONCE bad here? If
> > there is a racing access, why not be explicit about it?
>
> It's probably not terrible to put a READ_ONCE() there; we just need to
> make sure the compiler doesn't do something stupid (it is known to do
> stupid when 'volatile' is present).

Maybe we need to optimize READ_ONCE().

'if (data_race(..))' would also work here and has no cost.

> But the fact remains that it is entirely superfluous, there is no
> possible way the compiler can wreck this.

Agree. Still thinking if there is a way to do it without changing the
compiler, but I can't see it right now. :/

Thanks,
-- Marco