Re: [RFC][PATCH 0/5] arch: atomic rework

From: Will Deacon
Date: Thu Feb 06 2014 - 14:00:03 EST


On Thu, Feb 06, 2014 at 06:55:01PM +0000, Ramana Radhakrishnan wrote:
> On 02/06/14 18:25, David Howells wrote:
> >
> > Is it worth considering a move towards using C11 atomics and barriers and
> > compiler intrinsics inside the kernel? The compiler _ought_ to be able to do
> > these.
>
>
> It sounds interesting to me, if we can make it work properly and
> reliably. + gcc@xxxxxxxxxxx for others in the GCC community to chip in.

Given my (albeit limited) experience playing with the C11 spec and GCC, I
really think this is a bad idea for the kernel. It seems that nobody really
agrees on exactly how the C11 atomics map to real architectural
instructions on anything but the trivial architectures. For example, should
the following code fire the assert?


extern atomic<int> foo, bar, baz;

void thread1(void)
{
foo.store(42, memory_order_relaxed);
bar.fetch_add(1, memory_order_seq_cst);
baz.store(42, memory_order_relaxed);
}

void thread2(void)
{
while (baz.load(memory_order_seq_cst) != 42) {
/* do nothing */
}

assert(foo.load(memory_order_seq_cst) == 42);
}


To answer that question, you need to go and look at the definitions of
synchronises-with, happens-before, dependency_ordered_before and a whole
pile of vaguely written waffle to realise that you don't know. Certainly,
the code that arm64 GCC currently spits out would allow the assertion to fire
on some microarchitectures.

There are also so many ways to blow your head off it's untrue. For example,
cmpxchg takes a separate memory model parameter for failure and success, but
then there are restrictions on the sets you can use for each. It's not hard
to find well-known memory-ordering experts shouting "Just use
memory_model_seq_cst for everything, it's too hard otherwise". Then there's
the fun of load-consume vs load-acquire (arm64 GCC completely ignores consume
atm and optimises all of the data dependencies away) as well as the definition
of "data races", which seem to be used as an excuse to miscompile a program
at the earliest opportunity.

Trying to introduce system concepts (writes to devices, interrupts,
non-coherent agents) into this mess is going to be an uphill battle IMHO. I'd
just rather stick to the semantics we have and the asm volatile barriers.

That's not to say I don't there's no room for improvement in what we have
in the kernel. Certainly, I'd welcome allowing more relaxed operations on
architectures that support them, but it needs to be something that at least
the different architecture maintainers can understand how to implement
efficiently behind an uncomplicated interface. I don't think that interface is
C11.

Just my thoughts on the matter...

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/