Re: [RFC] LKMM: Add volatile_if()

From: Marco Elver
Date: Mon Jun 07 2021 - 13:05:23 EST


On Mon, Jun 07, 2021 at 08:28AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 07, 2021 at 10:27:10AM +0200, Marco Elver wrote:
> > On Mon, 7 Jun 2021 at 10:02, Alexander Monakov <amonakov@xxxxxxxxx> wrote:
> > > On Sun, 6 Jun 2021, Linus Torvalds wrote:
> > [...]
> > > > On Sun, Jun 6, 2021 at 2:19 PM Alexander Monakov <amonakov@xxxxxxxxx> wrote:
> > [...]
> > > > Btw, since we have compiler people on line, the suggested 'barrier()'
> > > > isn't actually perfect for this particular use:
> > > >
> > > > #define barrier() __asm__ __volatile__("" : : "i" (__COUNTER__) : "memory")
> > > >
> > > > in the general barrier case, we very much want to have that "memory"
> > > > clobber, because the whole point of the general barrier case is that
> > > > we want to make sure that the compiler doesn't cache memory state
> > > > across it (ie the traditional use was basically what we now use
> > > > "cpu_relax()" for, and you would use it for busy-looping on some
> > > > condition).
> > > >
> > > > In the case of "volatile_if()", we actually would like to have not a
> > > > memory clobber, but a "memory read". IOW, it would be a barrier for
> > > > any writes taking place, but reads can move around it.
> > > >
> > > > I don't know of any way to express that to the compiler. We've used
> > > > hacks for it before (in gcc, BLKmode reads turn into that kind of
> > > > barrier in practice, so you can do something like make the memory
> > > > input to the asm be a big array). But that turned out to be fairly
> > > > unreliable, so now we use memory clobbers even if we just mean "reads
> > > > random memory".
> > >
> > > So the barrier which is a compiler barrier but not a machine barrier is
> > > __atomic_signal_fence(model), but internally GCC will not treat it smarter
> > > than an asm-with-memory-clobber today.
> >
> > FWIW, Clang seems to be cleverer about it, and seems to do the optimal
> > thing if I use a __atomic_signal_fence(__ATOMIC_RELEASE):
> > https://godbolt.org/z/4v5xojqaY
>
> Indeed it does! But I don't know of a guarantee for that helpful
> behavior.

Is there a way we can interpret the standard in such a way that it
should be guaranteed?

If yes, it should be easy to add tests to the compiler repos for
snippets that the Linux kernel relies on (if we decide to use
__atomic_signal_fence() for this).

If no, we can still try to add tests to the compiler repos, but may
receive some push-back at the very latest when some optimization pass
decides to break it. Because the argument then is that it's well within
the language standard.

Adding language extensions will likely be met with resistance, because
some compiler folks are afraid of creating language forks (the reason
why we have '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang').
That could be solved if we declare Linux-C a "standard", and finally get
-std=linux or such, at which point asking for "volatile if" directly
would probably be easier without jumping through hoops.

The jumping-through-hoops variant would probably be asking for a
__builtin primitive that allows constructing volatile_if() (if we can't
bend existing primitives to do what we want).

Thanks,
-- Marco