Re: [RFC] LKMM: Add volatile_if()

From: Linus Torvalds
Date: Wed Jun 09 2021 - 14:26:55 EST


On Wed, Jun 9, 2021 at 9:13 AM Marco Elver <elver@xxxxxxxxxx> wrote:
>
> I had a longer discussion with someone offline about it, and the
> problem with a builtin is similar to the "memory_order_consume
> implementation problem"

The "memory_order_consume" problem is *entirely* artificial, and due
to the C standards body incompetence.

Really. I was there. Only very peripherally, but I was involved enough
to know what the problem was.

And the problem wasn't the concept of 'consume'. The problem was
entirely and 100% the incorrect model that the C standards people used
to describe the problem.

The C standards people took a "syntax and type based" approach to the
whole thing, and it was an utter disaster. It's the wrong model
entirely, because it became very very hard to describe the issue in
terms of optimizations of expressions and ordering at a syntactic
level.

What the standard _should_ have done, is to describe it in the same
terms that "volatile" is described - make all memory accesses "visible
in the virtual machine", and then specify the memory ordering
requirements within that virtual machine.

We have successful examples of that from other languages. I'm sorry if
this hurts some C language lawyers fragile ego, but Christ, Java did
it better. Java! A language that a lot of people love to piss on. But
it did memory ordering fundamentally better.

And it's not like it would even have been a new concept. The notion of
"volatile" has been there since the very beginning of C. Yes, yes, the
C++ people screwed it up mightily and confused themselves about what
an "access" means. But "volatile" is actually a lot better specified
than the memory ordering requirements were, and the specifications are
(a) simpler and (b) much *much* easier for a compiler person to
understand.

Plus with memory ordering described as an operation - rather than as a
type - even the C++ confusion of volatile would have gone away. So the
very thing that likely made people want to avoid the "visible access
in the virtual machine" model didn't even _exist_ in the first place.

So the language committee pointlessly said "volatile is bad, we need
to do something else", and came up with something that was an order of
magnitude worse than volatile, and that simply _couldn't_ possibly
sanely handle that "problem of consume".

But the problem was always purely about the model used to _describe_
the issue being bad, not the issue itself.

The "consume" memory ordering is actually very easy to describe in the
"as if" virtual machine memory model (well, as easy as _any_ memory
ordering is). If the C standards committee hadn't picked the wrong way
to describe things, the problem simply would not exist.

Really.

And I guarantee you that compiler writes would have had an easier time
with that "virtual memory model" approach too. No, memory ordering
sure as hell isn't simple to understand for *anybody*, but it got
about a million times worse by using the wrong abstraction layer to
try to "explain" it.

It really is fairly easy to explain what "acquire" is at a virtual
machine model level. About as easy as memory ordering gets. For a
compiler writer, it basically turns into "you have to do the actual
access using XYZ, and then you can't move later memory operations to
before it". End of story.

So you can actually describe these things in fairly straighforward
manner if you actually do it at that virtual machine level, because
that's literally the language that the hardware itself works at.

And then you could easily have defined "consume" as being the same
thing as "acquire", except that you can drop the special XYZ access
(fence, ld.acq, whatever) and replace it with a plain load if there
are only data dependencies on the loaded value (assuming, of course,
that your target hardware then supports that ordering requirements:
alpha would _always_ need the barrier).

That could literally have been done as a peephole optimization, and a
compiler writer would never have had to even really worry about it.
Easy peasy. 99% of all compiler writers would not have to know
anything about the issue, there would be just one very special
optimization at the end that allows you to drop a barrier (or turn a
"ld.acq" into just an "ld") once you see all the uses of that loaded
value. A trivial peephole will handle 99% of all cases, and then for
the rest you just keep it as acquire.

So anybody who tells you that "consume is complicated" is wrong.
Consume is *not* complicated. They've just chosen the wrong model to
describe it.

Look, memory ordering pretty much _is_ the rocket science of CS, but
the C standards committee basically made it a ton harder by specifying
"we have to make the rocket out of duct tape and bricks, and only use
liquid hydrogen as a propellant".

Linus