Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

From: Paul E. McKenney
Date: Wed Aug 21 2019 - 09:23:19 EST


On Wed, Aug 21, 2019 at 11:32:01AM +0100, Will Deacon wrote:
> On Tue, Aug 20, 2019 at 01:29:32PM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> > > On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> > >
> > > > The data tearing issue is almost a non-issue. We're not going to add
> > > > WRITE_ONCE() to these kinds of places for no good reason.
> > >
> > > Paulmck actually has an example of that somewhere; ISTR that particular
> > > case actually got fixed by GCC, but I'd really _love_ for some compiler
> > > people (both GCC and LLVM) to state that their respective compilers will
> > > not do load/store tearing for machine word sized load/stores.
> >
> > I do very much recall such an example, but I am now unable to either
> > find it or reproduce it. :-/
> >
> > If I cannot turn it up in a few days, I will ask the LWN editors to
> > make appropriate changes to the "Who is afraid" article.
> >
> > > Without this written guarantee (which supposedly was in older GCC
> > > manuals but has since gone missing), I'm loathe to rely on it.
> > >
> > > Yes, it is very rare, but it is a massive royal pain to debug if/when it
> > > does do happen.
> >
> > But from what I can see, Linus is OK with use of WRITE_ONCE() for data
> > races on any variable for which there is at least one READ_ONCE().
> > So we can still use WRITE_ONCE() as we would like in our own code.
> > Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
> > it is better than the proverbial kick in the teeth.
> >
> > Of course, if anyone knows of a compiler/architecture combination that
> > really does tear stores of 32-bit constants, please do not keep it
> > a secret! After all, it would be good to get that addressed easily
> > starting now rather than after a difficult and painful series of
> > debugging sessions.
>
> It's not quite what you asked for, but if you look at the following
> silly code:
>
> typedef unsigned long long u64;
>
> struct data {
> u64 arr[1023];
> u64 flag;
> };
>
> void foo(struct data *x)
> {
> int i;
>
> for (i = 0; i < 1023; ++i)
> x->arr[i] = 0;
>
> x->flag = 0;
> }
>
> void bar(u64 *x)
> {
> *x = 0xabcdef10abcdef10;
> }
>
> Then arm64 clang (-O2) generates the following for foo:
>
> foo: // @foo
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> orr w2, wzr, #0x2000
> mov w1, wzr
> mov x29, sp
> bl memset
> ldp x29, x30, [sp], #16 // 16-byte Folded Reload
> ret
>
> and so the store to 'flag' has become part of the memset, which could
> easily be bytewise in terms of atomicity (and this isn't unlikely given
> we have a DC ZVA instruction which only guaratees bytewise atomicity).
>
> GCC (also -O2) generates the following for bar:
>
> bar:
> mov w1, 61200
> movk w1, 0xabcd, lsl 16
> stp w1, w1, [x0]
> ret
>
> and so it is using a store-pair instruction to reduce the complexity in
> the immediate generation. Thus, the 64-bit store will only have 32-bit
> atomicity. In fact, this is scary because if I change bar to:
>
> void bar(u64 *x)
> {
> *(volatile u64 *)x = 0xabcdef10abcdef10;
> }
>
> then I get:
>
> bar:
> mov w1, 61200
> movk w1, 0xabcd, lsl 16
> str w1, [x0]
> str w1, [x0, 4]
> ret
>
> so I'm not sure that WRITE_ONCE would even help :/

Well, I can have the LWN article cite your email, then. So thank you
very much!

Is generation of this code for a 64-bit volatile store considered a bug?
Or does ARMv8 exclude the possibility of 64-bit MMIO registers? And I
would guess that Thomas and Linus would ask a similar bugginess question
for normal stores. ;-)

> It's worth noting that:
>
> void baz(atomic_long *x)
> {
> atomic_store_explicit(x, 0xabcdef10abcdef10, memory_order_relaxed)
> }
>
> does the right thing:
>
> baz:
> mov x1, 61200
> movk x1, 0xabcd, lsl 16
> movk x1, 0xef10, lsl 32
> movk x1, 0xabcd, lsl 48
> str x1, [x0]
> ret

OK, the C11 and C++11 guys should be happy with this.

> Whilst these examples may be contrived, I do thing they illustrate that
> we can't simply say "stores to aligned, word-sized pointers are atomic".

And thank you again!

Thanx, Paul