Re: [PATCH] locking/atomic: atomic: Use arch_atomic_{read,set} in generic atomic ops

From: Jonas Oberhauser
Date: Wed Feb 01 2023 - 05:52:04 EST




On 1/31/2023 11:03 PM, Boqun Feng wrote:
On Tue, Jan 31, 2023 at 04:08:29PM +0100, Jonas Oberhauser wrote:

On 1/30/2023 7:38 PM, Boqun Feng wrote:
On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
On 1/27/2023 11:09 PM, Boqun Feng wrote:
On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:
I also noticed that GCC has some builtin/extension to do such things,
__atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
can be used in the kernel.
On a per-architecture basis only, the C/C++ memory model does not match
the Linux Kernel memory model so using the compiler to generate the
atomic ops is somewhat tricky and needs architecture audits.
Hijack this thread a little bit, but while we are at it, do you think it
makes sense that we have a config option that allows archs to
implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
between two memory models, but the option is only for fallback/generic
implementation so we can put extra barriers/orderings to make things
guaranteed to work.


[...]
I'm also curious whether link time optimization can resolve the inlining
issue?

For Rust case, cross-language LTO is needed I think, and last time I
tried, it didn't work.
In German we say "Was noch nicht ist kann ja noch werden", translated as
"what isn't can yet become", I don't feel like putting too much effort into
Not too much compared to wrapping LKMM atomics with Rust using FFI,

Using FFI:

impl Atomic {
fn read_acquire(&self) -> i32 {
// SAFTEY:
unsafe { atomic_read_acquire(self as _) }
}
}

Using standard atomics:

impl Atomic {
fn read_acquire(&self) -> i32 {
// self.0 is a Rust AtomicI32
compiler_fence(SeqCst); // Rust not support volatile atomic yet
self.0.load(Acquire)
}
}

Needless to say, if we really need LKMM atomics in Rust, it's kinda my
job to implement these, so not much different for me ;-) Of course, any
help is appreciate!

I think a lot more mental effort goes into figuring out where and which barriers go everywhere.
But of course if there's curiosity driving you, then that small trade-off may be acceptable : )

something that hardly affects performance and will hopefully become obsolete
at some point in the near future.

I think another big question for me is to which extent it makes sense
anyways to have shared memory concurrency between the Rust code and the C
code. It seems all the bad concurrency stuff from the C world would flow
into the Rust world, right?
What do you mean by "bad" ;-) ;-) ;-)
Uh oh. Let's pretend I didn't say anything :D

If you can live without shared Rust & C concurrency, then perhaps you can
get away without using LKMM in Rust at all, and just rely on its (C11-like)
memory model internally and talk to the C code through synchronous, safer
ways.

First I don't think I can avoid using LKMM in Rust, besides the
communication from two sides, what if kernel developers just want to
use the memory model they learn and understand (i.e. LKMM) in a new Rust
driver?
I'd rather people think 10 times before relying on atomics to write Rust
code.
There may be cases where it can't be avoided because of performance reasons,
but Rust has a much more convenient concurrency model to offer than atomics.
I think a lot more people understand Rust mutexes or channels compared to
atomics.
C also has more convenient concurrency tools in kernel, and I'm happy
that people use them. But there are also people (including me) working
on building these tools/models, inevitably we need to use atomics.

:D

1. Use Rust standard atomics and pretend different memory models
work together (do we have model tools to handle code in
different models communicating with each other?)

I'm not aware of any generic tools, and in particular for Rust and LKMM it will take some thought to create interoperability.
This is because the po | sw and ppo | rfe | pb styles are so different.

I did some previous work on letting SC and x86 talk to each other through shared memory, which is much easier because both can be understood through the ppo | rfe | coe | fre lense, just that on SC everything is preserved; it still wasn't completely trivial because the SC part was actually implemented efficiently on x86 as well, so it was a "partial-DRF-partial-SC" kind of deal.



2. Use Rust standard atomics and add extra mb()s to enforce more
ordering guarantee.

3. Implement LKMM atomics in Rust and use them with caution when
comes to implicit ordering guarantees such as ppo. In fact lots
of implicit ordering guarantees are available since the compiler
won't exploit the potential reordering to "optimize", we also
kinda have tools to check:

https://lpc.events/event/16/contributions/1174/attachments/1108/2121/Status%20Report%20-%20Broken%20Dependency%20Orderings%20in%20the%20Linux%20Kernel.pdf

A good part of using Rust is that we may try out a few tricks
(with proc-macro, compiler plugs, etc) to express some ordering
expection, e.g. control dependencies.

Two suboptions are:

3.1 Implement LKMM atomics in Rust with FFI

I'd target this one ; ) It seems the most likely to work the way people want, with perhaps the least effort, and a good chance of not having overhead at some point in the future.

3.2 Implement LKMM atomics in Rust with Rust standard
atomics

I'm happy to figure out pros and cons behind each option.



Best wishes, jonas