Re: spin_unlock optimization(i386)

Marco Colombo (marco@esi.it)
Sat, 27 Nov 1999 01:24:03 +0100 (MET)


Hi all. Guys, please be patient: i'm a newbie and i've got some maybe
silly questions on the topic this interesting thread is about:

as far as i understand it (reading only this thread, no Intel docs),
under Processor Order, CPU B sees stores made by CPU A in order, but
possibly delayed. I was wondering if this may cause unnecessary spins
on a CPU which is spinlocking, "waiting" to acquire the lock: it is
possible that CPU A releases the lock with a store, and CPU B doesn't
see it for a while? Note that this is not an ordering issue:
from one of Mr. Boleyn messages i understand that if CPU B acquires
the lock seeing the store in the unlock() made by CPU A, it must also
see all stores previously made by A in the critical section: in other
words, they are committed from B's point of view. The problem is not
that A critical section becomes longer (which may happen, and it's
harmless, a Linus clearly stated) but not letting B section become
shorter (with some of its reads executed *before* entering it). And that
won't happen, given Processor Order.

So, my question: can reads made by CPU B on the lock variable see an old
value (=busy) even after the unlock() made by A, for a while? And if yes,
how big is this interval (after all, B is really wasting cycles...)?
If not, does it mean that the serializing lock read made by CPU B
while spinlocking force a global (=on all CPUs) commit of any pending
store (thus making all the system work as WT for all the time B spinlocks)?

So, either 1) we're wasting cycles on CPU B spinning on a *free* lock (for
a while), or 2) the whole system performs as WT (or even worst) while B
is spinning. Am i missing something?

Sorry for the long message, and the waste of bandwith...
TiA,
.TM.

P.S.
BTW,
1) could be solved forcing a commit on CPU A right after (as part of) s
the unlock() (but, is it possible? were we doing precisely that, before
the whole thread starts?);
2) could be solved having B spinning inside two loops, the inner without
any lock, for a suitble number N of cycles, the outer which does the lock();
N has to be computed carefully. This really matters when the number of
CPUs is high, say 16, i think. Assuming A acquired the lock, and B is waiting
for it, the other 14 CPUs will have their activity serialized by B
continuously issuing lock() (BTW, is it true?).

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/