Re: [PATCH 0/24] make atomic_read() behave consistently across allarchitectures

From: Linus Torvalds
Date: Tue Aug 21 2007 - 12:51:46 EST




On Tue, 21 Aug 2007, Chris Snook wrote:
>
> Moore's law is definitely working against us here. Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

Note that "barrier()" is purely a compiler barrier. It has zero impact on
the CPU pipeline itself, and also has zero impact on anything that gcc
knows isn't visible in memory (ie local variables that don't have their
address taken), so barrier() really is pretty cheap.

Now, it's possible that gcc messes up in some circumstances, and that the
memory clobber will cause gcc to also do things like flush local registers
unnecessarily to their stack slots, but quite frankly, if that happens,
it's a gcc problem, and I also have to say that I've not seen that myself.

So in a very real sense, "barrier()" will just make sure that there is a
stronger sequence point for the compiler where things are stable. In most
cases it has absolutely zero performance impact - apart from the
-intended- impact of making sure that the compiler doesn't re-order or
cache stuff around it.

And sure, we could make it more finegrained, and also introduce a
per-variable barrier, but the fact is, people _already_ have problems with
thinking about these kinds of things, and adding new abstraction issues
with subtle semantics is the last thing we want.

So I really think you'd want to show a real example of real code that
actually gets noticeably slower or bigger.

In removing "volatile", we have shown that. It may not have made a big
difference on powerpc, but it makes a real difference on x86 - and more
importantly, it removes something that people clearly don't know how it
works, and incorrectly expect to just fix bugs.

[ There are *other* barriers - the ones that actually add memory barriers
to the CPU - that really can be quite expensive. The good news is that
the expense is going down rather than up: both Intel and AMD are not
only removing the need for some of them (ie "smp_rmb()" will become a
compiler-only barrier), but we're _also_ seeing the whole "pipeline
flush" approach go away, and be replaced by the CPU itself actually
being better - so even the actual CPU pipeline barriers are getting
cheaper, not more expensive. ]

For example, did anybody even _test_ how expensive "barrier()" is? Just
as a lark, I did

#undef barrier
#define barrier() do { } while (0)

in kernel/sched.c (which only has three of them in it, but hey, that's
more than most files), and there were _zero_ code generation downsides.
One instruction was moved (and a few line numbers changed), so it wasn't
like the assembly language was identical, but the point is, barrier()
simply doesn't have the same kinds of downsides that "volatile" has.

(That may not be true on other architectures or in other source files, of
course. This *does* depend on code generation details. But anybody who
thinks that "barrier()" is fundamentally expensive is simply incorrect. It
is *fundamnetally* a no-op).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/