Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

From: Olivier Dion
Date: Tue Jul 04 2023 - 13:19:39 EST

Next message: Rafael J. Wysocki: "Re: [PATCH] ACPICA: actbl2: change to be16/be32 types for big endian data"
Previous message: Dmitry Rokosov: "Re: [PATCH v1 5/5] arm64: dts: meson: a1: change uart compatible string"
In reply to: Alan Stern: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Next in thread: Alan Stern: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 03 Jul 2023, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
>> This is a request for comments on extending the atomic builtins API to
>> help avoiding redundant memory barriers. Indeed, there are
>
> What atomic builtins API are you talking about? The kernel's? That's
> what it sounded like when I first read this sentence -- why else post
> your message on a kernel mailing list?

Good point, we meant the `__atomic' builtins from GCC and Clang. Sorry
for the confusion.

[...]

>> fully-ordered atomic operations like xchg and cmpxchg success in LKMM
>> have implicit memory barriers before/after the operations [1-2], while
>> atomic operations using the __ATOMIC_SEQ_CST memory order in C11/C++11
>> do not have any ordering guarantees of an atomic thread fence
>> __ATOMIC_SEQ_CST with respect to other non-SEQ_CST operations [3].
>
> After reading what you wrote below, I realized that the API you're
> thinking of modifying is the one used by liburcu for user programs.
> It's a shame you didn't mention this in either the subject line or the
> first few paragraphs of the email; that would have made understanding
> the message a little easier.

Indeed, our intent is to discuss the Userspace RCU uatomic API by extending
the toolchain's atomic builtins and not the LKMM itself. The reason why
we've reached out to the Linux kernel developers is because the
original Userspace RCU uatomic API is based on the LKMM.

> In any case, your proposal seems reasonable to me at first glance, with
> two possible exceptions:
>
> 1. I can see why you have special fences for before/after load,
> store, and rmw operations. But why clear? In what way is
> clearing an atomic variable different from storing a 0 in it?

We could indeed group the clear with the store.

We had two approaches in mind:

a) A before/after pair by category of operation:

- load
- store
- RMW

b) A before/after pair for every operation:

- load
- store
- exchange
- compare_exchange
- {add,sub,and,xor,or,nand}_fetch
- fetch_{add,sub,and,xor,or,nand}
- test_and_set
- clear

If we go for the grouping in a), we have to take into account that the
barriers emitted need to cover the worse case scenario. As an example,
Clang can emit a store for a exchange with SEQ_CST on x86-64, if the
returned value is not used.

Therefore, for the grouping in a), all RMW would need to emit a memory
barrier (with Clang on x86-64). But with the scheme in b), we can emit
the barrier explicitly for the exchange operation. We however question
the usefulness of this kind of optimization made by the compiler, since
a user should use a store operation instead.

> 2. You don't have a special fence for use after initializing an
> atomic. This operation can be treated specially, because at the
> point where an atomic is initialized, it generally has not yet
> been made visible to any other threads.

I assume that you're referring to something like std::atomic_init from
C++11 and deprecated in C++20? I do not see any scenario on any
architecture where a compiler would emit an atomic operation for the
initialization of an atomic variable. If a memory barrier is required
in this situation, then an explicit one can be emitted using the
existing API.

In our case -- with the compiler's atomic builtins -- the initialization
of a variable can be done without any atomic operations and does not
require any memory barrier. This is a consequence of being capable of
working with integral-scalar/pointer type without an atomic qualifier.

> Therefore the fence which would normally appear after a store (or
> clear) generally need not appear after an initialization, and you
> might want to add a special API to force the generation of such a
> fence.

I am puzzled by this. Initialization of a shared variable does not need
to be atomic until its publication. Could you expand on this?

Thanks for the feedback,
Olivier

--
Olivier Dion
EfficiOS Inc.
https://www.efficios.com

Next message: Rafael J. Wysocki: "Re: [PATCH] ACPICA: actbl2: change to be16/be32 types for big endian data"
Previous message: Dmitry Rokosov: "Re: [PATCH v1 5/5] arm64: dts: meson: a1: change uart compatible string"
In reply to: Alan Stern: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Next in thread: Alan Stern: "Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]