RE: spin_unlock optimization(i386)

David Schwartz (davids@webmaster.com)
Tue, 30 Nov 1999 03:25:21 -0800


> "David Schwartz" <davids@webmaster.com> wrote:
> > A speculative read is never "thrown away" if it's needed.
> It's only thrown
> > away if it's not executed (say, due to a conditional branch
> instruction).
>
> That implies weak ordering if in the process of optimizing (hoisting
> the read before the write), the CPU creates a result different from
> the case that the instructions were executed in order. I don't think
> IA32 is weakly ordered. Our compilers would need to do a lot more
> work (as for a MIPS, say).

IA32 is weakly ordered with respect to an outside observer but strongly
ordered with respect to the code in the processor. The result (with no
outside interference) is always exactly the same as what would occur if it
were strongly ordered.

Put another way, the processor will reorder whenever it thinks that such
reordering will cause no problem.

For example:

volatile int valid=0, value;

CPU0:
while(valid==0);
output=value;

CPU1:
value=45;
valid=1;

Now CPU0 sees no dependency between the two reads, so it's free to use a
speculative read for 'value' -- one that actually took place on the bus
before the read inside the while loop.

> > > Just flip the diagram to get the case for B.
> > >
> > > It is my understanding that a read for A that follows a write to B on
> > > one processor can move *before* that write, but the read result on A
> > > isn't retired until after the write to B occurs (in processor order).
> >
> > Correct, the read isn't retired, but the value read earlier
> is the value
> > that will be retired when the retire takes place.
>
> Are you sure about that? I think that's the issue this list is trying
> to resolve.

Positive. The processor has no way to know that another processor modified
the value 'out from under it', so it has no reason not to use the results of
the speculative read.

> > > The processor sees all writes before it's own that have occurred after
> > > the read (so it sees A=1), and if any of them touch A, it will
> > > invalidated the read on A, *before* its result is retired. The read
> > > is redone, and it gets the correct value for A.
> >
> > The read is never redone. How would the processor possibly
> know to redo the
> > read? And why would it? Are you suggesting that every
> speculative read be
> > redone if any chunk of memory is written to by any processor?
>
> Not every read. In the simplest case, it would throw away any read
> that preceded a write, if it sees another write (to any memory
> location) coming from another processor before it's write. In the
> best case, it would compare the address and size of the read.

It can't see writes from other processors to other memory areas. If they
already hold those areas exclusive, it's not even visible on their bus!

> Why would it do this? To simplify the compiler writer's job.
>
> If it didn't do this, then how do you explain this expected result,
> that C so relies on so very much.
>
> code sequence: C:
> write A=0 A = 0;
> read A (expect 0) B = A;
> write B=A
>
> Your comments suggest this sequence:
>
> read A (A=1) <- speculative read
> write A=0
> write B=A (B=1) <- use results of speculative read
>
> And this isn't even MP!!

In this case, the processor can see the depedency, so it knows to handle
this. What it in fact does (on the P2) is even smarter, it uses register
renaming to speculatively read in the 'wrong place' that happens to contain
the right value.

In other words, any x86-family processor that reorders has to do
in-processor dependency analysis. The extent to which it can do this
analysis is the extent to which it can reorder. However, it is impossible
for it to see dependencies across processors. This is why we need memory
barriers.

DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/