Re: Read/write locks

Colin Plumb (colin@nyx.net)
Sun, 21 Sep 1997 17:19:54 -0600 (MDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Charlie Brady: "Re: SCSI timeouts - 53C810, QIC-150, 2.0.31pre9"
Previous message: Andreas Muck: "EXT2-fs errors with 2.0.30"
Maybe in reply to: Colin Plumb: "Read/write locks"

The Great and Powerful Linus did did write unto the Masses:
> No, for spinlocks, the upgrade lock doesn't get any fatter at all, it
> should fit right in with the current write-lock scheme (which I still
> consider extremely clever of me, I haven't seen others do it that way).

I've used positive = read locks, -1 = write lock before, similarly using the
sign bit.

> I _think_ the only thing you'd need to do to distinguish a write lock
> from an update lock is that the update-lock doesn't need to wait for
> readers to go away (you do that only if/when you decide to really
> upgrade to a real write lock).

And since the current locks are implemented this way anyway, it just
consists of splitting the code into two parts.

> That approach would make the update lock lock out any new readers, but
> at least it is a very simple implementation, and to some degree it can
> be considered a feature (we'd prioritize updaters over readers -
> something that may or may not be what we want to do anyway).

In fact, the current code does *not* do this. It lets new readers in
while a writer is trying to get a write lock, which leaves open the
possibility of starvation for the writer - a continuous stream of readers
can leave the writer blocked forever.

(Admittedly farfetched, given the expected level of contention for
spinlocks.)

It seems odd that the code clears the writer-locked bit (bit 31) if there
are readers present. Wouldn't it make more sense to block incoming readers
while a writer is contending?

If you want the reader-priority scheme, then a write lock actually
gets simpler given CAS, CMPXCHG, or the like.

Basically, it's:
asm volatile("
1: lock; cmpxchg %0,%2
jne 2f
.section .text.lock,"ax"
2: movl [%0],%1
testl %1,%1
jne 2b
jmp 1b
.previous
" : "r" (rw), "a" (0), "r" (0x80000000) : "cc", "memory");

This version has the advantage that it lets gcc choose the registers
to use and tells it as much as possible about the state of the
registers at the end. (e.g. it knows that %eax is 0).
(Given that SMP doesn't work on 386's, I presume it's okay to use
CMPXCHG in generic x86 SMP code?)

Mini-tutorial:

For those who don't understand SMP locking intricacies, there are two
parts to any well-working spinlock. The atomic test-and-set
instruction which actually does the operation, and a read-only spin
loop waiting for a chance to use the test-and-set. You could just spin
trying the test-and-set, but every time you do that it causes a complex
inter-processor transaction on the bus which eats performance FOR EVERY
OTHER PROCESSOR IN THE SYSTEM. A read-only spin is satisfied out of
the processor's primary cache so it goes off the global bus until the
other processor releases the lock and broadcasts an invalidate on the
bus.

Which part you drop into first depends on the amount of contention you
expect for the lock. If you think it's going to be busy when you get
to it, it's best to test first, but if you think it'll probably be free,
you can optimistically try the lock first thing, and only spin reading
if that fails. That is the approach taken here. Also note that the
code to spin reading is actually out of line, so it won't even get
loaded into cache in the wait-free case.

Oh, yes, you may notice that this sort of "spin lock" is busy-waiting,
the thing that you were told is a strict no-no in a multitasking
system? Yes, indeed it is. This is because it's very very cheap
in the no-spin case, while something more complex that can give up
the processor is quite expensive in the case where there is no actual
contention, so the average amount of time taken by an (inefficient)
spinlock is very often much less!

The important rule is don't hold a spinlock for too long, or other
processors will start piling up waiting for it. (Which is also why you
don't want to enable interrupts while holding a spinlock, because you
should be done with a spinlock even faster than with an interrupt.)

-- 
	-Colin

Next message: Charlie Brady: "Re: SCSI timeouts - 53C810, QIC-150, 2.0.31pre9"
Previous message: Andreas Muck: "EXT2-fs errors with 2.0.30"
Maybe in reply to: Colin Plumb: "Read/write locks"