Re: Is clobber "memory" in include/asm-i386/system.h necessary?

Linus Torvalds (torvalds@cs.helsinki.fi)
Thu, 2 May 1996 15:49:32 +0300 (EET DST)


> In include/asm-i386/system.h there are numerous uses of __asm__ which
> clobber "memory". For reference, here they are:
>
> #define mb() __asm__ __volatile__ ("" : : :"memory")
> #define sti() __asm__ __volatile__ ("sti": : :"memory")
> #define cli() __asm__ __volatile__ ("cli": : :"memory")
>
> #define save_flags(x) \
> __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */ :"memory")
>
> #define restore_flags(x) \
> __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory")
>
> #define iret() __asm__ __volatile__ ("iret": : :"memory")
>
> (Uh, iret() is never even used . . .)
>
> This clobber seems unnecessary since none of these instructions modify
> memory that gcc doesn't know about, and any memory that is modified by
> other means (hardware, interrupt handlers, etc.) is (should be?)
> accounted for in other ways such as by using volatile.

No, the memory clobber is _important_. The Sparc version didn't have it, and it
resulted in not-so-nice bugs.

Essentially, think of the "memory" thing as a serializing flag rather than as a
"this instruction changes memory" flag. It is extremely important that gcc
_not_ re-order memory instructions across these serializing instructions,
because otherwise you might find that gcc moves a memory load over the
serializing instruction and then you lose (because you use a value that was
true _before_ interrupts were disabled)..

Using "volatile" for any data structures that can be modified by interrupts etc
is also very broken, because then you take a noticeable performance hit because
then gcc can't do any optimizations _at_all_ with that variable (and see below
about more on the total braindamage of that whole approach).

In short, it's much better to tell gcc not to optimize around these
instructions, so that it then can optimize safely everywhere else. It's like
the whole idea behind RISC: take a performance hit for the unusual case, but
optimize the normal case.

> As an experiment, I removed the clobbers, built a kernel, and it ran
> fine (but if you try this at home, you will trigger the gcc common
> subexpression elimination bug, so until it is fixed for real, your
> kernel may or may not work).

It will seem to work even without the memory clobber, but you'll have races you
don't even know about. It's just that you aren't aware of them, because they
happen very rarely (which makes them all the more insidious).

> So what's up with these clobbers? If they aren't necessary, they
> should be removed because they have a detrimental effect on code
> generation. If they are necessary, how about a comment explaining
> why?

Simple example, and one which actually bit the Sparc people:

int del_timer(struct timer_list * timer)
{
int ret = 0;
if (timer->next) {
unsigned long flags;
struct timer_list * next;
save_flags(flags);
cli();
if ((next = timer->next) != NULL) {
(next->prev = timer->prev)->next = next;
timer->next = timer->prev = NULL;
ret = 1;
}
restore_flags(flags);
}
return ret;
}

Look at how this is done - we first check if "timer->next" is non-zero, and
only if it's non-zero do we bother to do the (expensive) linked list fixup.
Note that the expense is not so much the linked list handling in itself, as the
fact that we have to disable interrupts around it, and as we often call
"del_timer()" when something isn't on the timer list (the caller can't know
that before it tries to delete it), it makes sense to have a special fast case
for the case where we don't need to do anything.

HOWEVER, if the interrupt setting stuff didn't force memory access
serialization, then gcc might (and does) cache the value of "timer->next"
across the interrupt disable, and that is definitely incorrect.

Now, somebody is bound to say "volatile" here, but I refuse to use volatiles
because if the linked list entries were made volatile than gcc would be doing
lots of unnecessary things in _other_ places. Besides, "volatile" really
doesn't make any sense at all, and I think C would be better off without it.

One reason I hate "volatile" is that using it for things that are changed by
interrupts is _fundamentally_ broken. And because I'm a real preacher, I'll
tell you all why:

- if you use a value that can change from under you with no warning, you're
_asking_ for bugs. You should _always_ make sure you have locked the value
down some way. Either by disabling interrupts, or by using semaphores or
similar to enter a critical section. If you don't do that, you deserve to
lose, and "volatile" doesn't buy you anything.

Essentially, using "volatile" is a way of saying "yes, I know this code doesn't
work, but I hope the compiler will magically make the problem go away". People
like that should be shot on the spot.

Right after this I have to mention a few exceptions to the rule: there _are_
two circumstances where using "volatile" is acceptable.

1) for a clock. Look at "jiffies". The whole _idea_ with a clock is that it
changes, and it's _ok_. A clock is not a linked list or anything like that.
2) For accessing hardware registers internally. No kernel code should do it
(they should use "readb()" etc that should take care of it), but in that
context it's ok. Again, for the same reason as in (1) - it's not a data
structure, it's just a very special "value".

Ok, enough preaching. But the above hopefully explains why the interrupt
inline asm statements are marked as clobbering memory, and if you take a look
at the "mb()" statement and the use of that for "start_bh_atomic()" etc, you'll
see the same logic at work (without any other inline assembly at all).

Essentially, you can think of the memory clobbers as barriers, and think of
"cli" and "sti" as

cli();
barrier();

and

barrier();
sti();

respectively.

Linus