Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Uros Bizjak
Date: Thu Oct 19 2023 - 13:21:41 EST


On Thu, Oct 19, 2023 at 7:00 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 19 Oct 2023 at 00:04, Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> >
> > Let me explain how the compiler handles volatile.
>
> We're talking past each other.
>
> You are talking about the volatile *memory* ops, and the the
> difference that "raw" vs "this" would cause with and without the
> "volatile".
>
> While *I* am now convinced that the memory ops aren't even an option,
> because they will generate worse code, because pretty much all users
> use the "this" version (which would have to use volatile),

Please see [1]. Even with volatile access, with memory ops the
compiler can propagate operands, resulting in ~8k code size reduction,
and many hundreds (if not thousands) MOVs propagated into subsequent
instructions. Please note many code examples in [1]. This is not
possible with the asm variant.

[1] https://lore.kernel.org/lkml/20231004192404.31733-1-ubizjak@xxxxxxxxx/

> Because if we just stick with inline asms, the need for "volatile"
> simply goes away.

No, the compiler is then free to remove or duplicate the asm (plus
other unwanted optimizations), please see the end of chapter 6.47.2.1
in [2].

[2] https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Extended-Asm.html#Volatile-1

> The existing volatile on those percpu inline asms is *wrong*. It's a
> historical mistake.

Please see above.

> And with just a plain non-volatile inline asm, the inline asm wins.

Please see [1] for the code propagation argument.

> It doesn't have the (bad) read-once behavior of a volatile memory op.
>
> And it also doesn't have the (horrible correctness issue)
> rematerialization behavior of a non-volatile memory op.

Unfortunately, it does. Without volatile, asm can be rematerialized in
the same way as it can be CSEd. OTOH, the memory op with memory-ops
approach is casted to volatile in this_* case, so it for sure won't
get rematerialized.

> A compiler that were to rematerializes an inline asm (instead of
> spilling) would be a bad joke. That's not an optimization, that's just
> a crazy bad compiler with a code generation bug.

But that is what the compiler does without volatile.

Thanks,
Uros.