Re: spin_unlock optimization(i386)

Gerard Roudier (groudier@club-internet.fr)
Sun, 21 Nov 1999 12:29:00 +0100 (MET)


On Sun, 21 Nov 1999, Manfred Spraul wrote:

> "Richard B. Johnson" wrote:
> >
> > On Sat, 20 Nov 1999, Manfred Spraul wrote:
> >
> > > the current spin_unlock asm code is
> > > "lock; btrl $0,%0"
> > > it takes ~ 22 ticks on my PII/350.
> > >
> > > I think it's possible to replace that with
> > > "movl $0,%0"
> > > which would be a simple, pairable single-tick instruction.
> >
> > Erm.... What about SMP machines? Are you going to get rid of them?
> > The purpose of the lock prefix is not to make the current CPU operation
> > atomic. It's to make all other CPUs halt until the operation is complete.
> > This gurantees that only one CPU modifies the variable at the same
> > time. You are not going to do that with a move.
> >
> I'm only talking about the _unlock_, obviously spin_lock() must use the
> lock
> prefix, and spin_lock needs a full memory barrier:
> unlock means that you own the lock, and that all other CPUs are waiting,
> and: we know the current value of spinlock_t.lock:
>
> lock;btrl means
> cpu pull LOCK
> cpu reads spinlock_t.lock (always 0x0000 0001)
> cpu clears bit 0, and updates the carry flag
> cpu writes spinlock_t.lock (always 0x0000 0000)
> cpu releases LOCK
>
> mov means
> cpu writes spinlock_t.lock (always 0x0)
>
> memory writes to 32-bit values are always atomic (and atomic_set()
> relies on this)
> --> the only difference is the memory ordering.

You may be right, and I am not going to say that you are not so given that
I am not a champion in hardware architecture.

However I seem to know that hardware protocols that address
synchronisation of memory, caches and CPU buffers, especially in SMP
systems, are not so perfect. For the i386 arch, the things that allow to
ensure some order on that mess-up are the LOCK prefix and some
serialisation instruction.

In my opinion, relying on apparent ordering instead of enforcing it may
lead to subtil race conditions. We may, for example, be victimized by some
hardware erratum due to brain-deaded hardware optimizations, if we try to
be clever when we should avoid to do so.

Obviously, for the situation you pointed out, we only want to change
atomically a single 32 bit quantity we know about its previous value and
using the LOCK prefix is indeed too much. But, you should, IMHO, base you
demonstration on what may _actually_ happen rather than what seems to
happen when the LOCK prefix is not used. (I mean TLB misses, cache
protocol possible situations and other goodies as CPU write buffers,
etc... that break apparent atomicity and may lead to races)

Just my 0.02Euros.

Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/