Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

From: Peter Zijlstra
Date: Tue Jul 04 2023 - 05:47:04 EST

Next message: Yangtao Li: "[PATCH] leds: ip30: Convert to devm_platform_ioremap_resource()"
Previous message: Krzysztof Kozlowski: "Re: [PATCH v4 3/3] riscv: dts: starfive: Add QSPI controller node for StarFive JH7110 SoC"
In reply to: Olivier Dion: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Next in thread: Jonathan Wakely: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:

> int x = 0;
> int y = 0;
> int r0, r1;
>
> int dummy;
>
> void t0(void)
> {
> __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
>
> __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
> __atomic_thread_fence(__ATOMIC_SEQ_CST);
>
> r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
> }
>
> void t1(void)
> {
> __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
> __atomic_thread_fence(__ATOMIC_SEQ_CST);
> r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
> }
>
> // BUG_ON(r0 == 0 && r1 == 0)
>
> On x86-64 (gcc 13.1 -O2) we get:
>
> t0():
> movl $1, x(%rip)
> movl $1, %eax
> xchgl dummy(%rip), %eax
> lock orq $0, (%rsp) ;; Redundant with previous exchange.
> movl y(%rip), %eax
> movl %eax, r0(%rip)
> ret
> t1():
> movl $1, y(%rip)
> lock orq $0, (%rsp)
> movl x(%rip), %eax
> movl %eax, r1(%rip)
> ret

So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.

> On x86-64 (clang 16 -O2) we get:
>
> t0():
> movl $1, x(%rip)
> movl $1, %eax
> xchgl %eax, dummy(%rip)
> mfence ;; Redundant with previous exchange.

And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.

> movl y(%rip), %eax
> movl %eax, r0(%rip)
> retq
> t1():
> movl $1, y(%rip)
> mfence
> movl x(%rip), %eax
> movl %eax, r1(%rip)
> retq

Next message: Yangtao Li: "[PATCH] leds: ip30: Convert to devm_platform_ioremap_resource()"
Previous message: Krzysztof Kozlowski: "Re: [PATCH v4 3/3] riscv: dts: starfive: Add QSPI controller node for StarFive JH7110 SoC"
In reply to: Olivier Dion: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Next in thread: Jonathan Wakely: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]