RE: spin_unlock optimization(i386)

David Schwartz (davids@webmaster.com)
Fri, 26 Nov 1999 23:55:01 -0800


> On Fri, 26 Nov 1999, David Schwartz wrote:
>
> > > If not, does it mean that the serializing lock read made by CPU B
> > > while spinlocking force a global (=on all CPUs) commit of any pending
> > > store (thus making all the system work as WT for all the time B
> > > spinlocks)?
> >
> > Memory barriers enforce order. They can't force an actual commit.
>
> Does the 'lock' before the 'btsl' mean it's a memory barrier?
>
> As i understand it, if your read something right after a mb(),
> you see every previous store (system-wide), as if in Strict Order.

No. Memory barriers are not system-wide. They only affect the reordering of
memory operations from a single processor.

> I.E. for (;;) { mb(); read a; } will force all the pending stores on other
> CPUs being committed before every read (?)

No. No. A memory barrier simply prevents the active processor from
'migrating' reads or writes across the barrier.

> BTW, I thougth that a lock
> does something even worse: prevents all other CPUs from executing any
> read/store until the "locked" instruction on this CPU is completed, thus
> "serializing" all read/stores system-wide.

No, that's not what a lock does either. A lock simply makes a
'read-modify-write' operation atomic.

> Such a thing may (i think is) be needed in the lock() since you have to
> serialize the read (no two CPUs can see a value of 0 at the same
> time), even
> if the variable is cached (on both). To draw it:
>
>
> CPUA CPUB
>
> asserts LOCK: waits for other
> CPUs to complete pending
> (read/store) instructions, <before the spinlock, don't care>
> prevents them from executing
> others, commits all changes
> (makes them visibile) on all CPUs
>
> reads and stores the updated <"blocked" by CPU A>
> spin_lock value
>
> releases LOCK, other CPUs
> may now go on reading/
> writing
> *note that the store may not
> be committed at this time*
> <resumes>
>
> tries to do the same lock():
> and as a side effect, the store
> made by CPUA gets committed (*)
>
> reads the spin_lock values and
> fails (lock is held by CPUA)
>
> <loops>
>
> releases the lock (store)
>
> goes thru the LOCK thing again,
> the store on CPUA gets committed,
> now the test succeeds and the
> lock is acquired
>
> (*) that's where the serializing properties of 'lock' are needed. The
> following read MUST see the updated value of spin_lock. No delay
> is allowed
> at all. 'lock' on CPU A has to hold back the read on CPU B until the
> store is executed, and 'lock' on CPU B has to make sure the store made
> by A is visible by B (ie committed) before the following read.

Your analysis is entirely wrong because it is grounded in misunderstandings
of what memory barriers and locks actually do. They have nothing to do with
'committing' reads or writes to main memory.

> > Usually, a mutex lock is implemented as:
> >
> > lock();
> > mb();
> >
> > And an unlock as:
> >
> > mb();
> > unlock();
> >
> > While this ensures correct behavior, it does not assure
> that you don't wait
> > 'too long'. And a memory barrier can't do that anyway.
>
> I see. Yet, i've read that a just a store acts as a mb, and if true, and
> if a mb is all we need, that i must be wrong with the whole
> purpose of 'lock'.
> And i'm lost.

On x86 architectures, stores are always performed in order. However, reads
may be reordered and the relative order of reads and writes is not
guaranteed. This is what memory barriers are for.

> > The cache coherency logic, on the other hand, should assure
> that the loop
> > eventually reads the right data, since it forces the cache of
> that processor
> > to assume ownership (or at least shared ownership) of that
> chunk of memory.
>
> Does this means that other pending stores on that chunk get written?
> So the "forced commit" i'm referring to happens only to that chunk, and
> that's no as bad as i thought. That makes a lot of sense. Still i think
> the 'lock' serializes operations also on different, possibly unrelated,
> variables, so all pending stores have to be committed on all CPUs,
> all chunks.

There are no forced commits directly. However, the cache coherency logic
will do the equivalent if one processor tries to gain access to an area of
memory that another processor has in its cache.

> > What does 'forcing a commit' mean? You're not suggesting
> the cache be
> > flushed, are you?
>
> Well, i have no clear idea here... remember i haven't read the real
> docs, so i don't know what a 'commit' is: i'll expain what I mean:
> basically, commit means make then other CPUs see the change *now*.
> If this means flushing the cache, or just make the change visible through
> some 'snooping' mechanism, without performing the actual writing,
> i don't know.

This is already done by the cache coherency logic. All operations are
always committed exactly when needed to avoid conflict with an operation
from another processor on the same memory.

> My concern here is really the penalty of having one CPU spinning on
> an instruction with the 'lock' serializing property, which, given that
> it works the way i've described (and it may not), kills performance
> system-wide. Having it spin on a nop for a little while, may let the other
> CPUs go on with their work undisturbed, at the price of some
> wasted cycles.

The thing is, the other CPU will eventually cause the cache coherency logic
to arbitrate for that area of memory. That will force what you call a commit
(but actually it's a writeback).

DS

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/