Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()

From: Uros Bizjak
Date: Wed Oct 18 2023 - 17:09:29 EST


On Wed, Oct 18, 2023 at 10:51 PM Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> On Wed, Oct 18, 2023 at 10:34 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, 18 Oct 2023 at 13:22, Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > And yes, sometimes we use actual volatile accesses for them
> > > (READ_ONCE() and WRITE_ONCE()) but those are *horrendous* in general,
> > > and are much too strict. Not only does gcc generally lose its mind
> > > when it sees volatile (ie it stops doing various sane combinations
> > > that would actually be perfectly valid), but it obviously also stops
> > > doing CSE on the loads (as it has to).
> >
> > Note, in case you wonder what I mean by "lose its mind", try this
> > (extremely stupid) test program:
> >
> > void a(volatile int *i) { ++*i; }
> > void b(int *i) { ++*i; }
> >
> > and note that the non-volatile version does
> >
> > addl $1, (%rdi)
> >
> > but the volatile version then refuses to combine the read+write into a
> > rmw instruction, and generates
> >
> > movl (%rdi), %eax
> > addl $1, %eax
> > movl %eax, (%rdi)
> >
> > instead.
> >
> > Sure, it's correct, but it's an example of how 'volatile' ends up
> > disabling a lot of other optimizations than just the "don't remove the
> > access".
> >
> > Doing the volatile as one rmw instruction would still have been very
> > obviously valid - it's still doing a read and a write. You don't need
> > two instructions for that.
>
> FYI: This is the reason RMW instructions in percpu.h are not (blindly)
> converted to C ops. They will remain in their (volatile or not) asm
> form because of the above reason, and due to the fact that they don't
> combine with anything.
>
> > I'm not complaining, and I understand *why* it happens - compiler
> > writers very understandably go "oh, I'm not touching that".
> >
> > I'm just trying to point out that volatile really screws up code
> > generation even aside from the "access _exactly_ once" issue.
> >
> > So using inline asm and relying on gcc doing (minimal) CSE will then
> > generate better code than volatile ever could, even when we just use a
> > simple 'mov" instruction. At least you get that basic combining
> > effect, even if it's not great.
>
> Actually, RMW insns are better written in asm, while simple "mov"
> should be converted to (volatile or not) memory access. On x86 "mov"s
> from memory (reads) will combine nicely with almost all other
> instructions.

BTW: There was a discussion that GCC should construct RMW instructions
also when the memory location is marked volatile, but there was no
resolution reached. So, the "I'm not touching that" approach remains.
However, GCC *will* combine a volatile read with a follow-up
instruction.

Uros.