Re: [Patch] shm bug introduced with pagecache in 2.3.11

Linus Torvalds (torvalds@transmeta.com)
Sat, 20 Nov 1999 16:33:49 -0800 (PST)


On Sat, 20 Nov 1999, Marcelo Tosatti wrote:
>
> http://bazar.conectiva.com.br/~marcelo/rwsem-2.3.18ac7.patch
> This code is a Linux "port" of the psedo-code implementation found in the
> "Unix Kernel Internals" book i wrote some time ago.

Well, if it's a port of that, then it won't have the 2-instruction
fast-path that is pretty much required, imho.

I'll see if I can get a free afternoon some day and try to port the
current x86 semaphore code over to a rw version too. The plan was
something like this:

- read_down():

lock ; incl mem
js contention_rw

- read_up():

lock ; decl mem
js wake_up_writer

- write_down():

lock ; btsl $31,mem
jc contention_ww
testl $0x7fffffff,mem
jne contention_wr

- write_up():

lock ; andl $0x7fffffff,mem
jne wake_up_reader_or_writer

where all the three contention cases grab a "contention spinlock" before
they then start sorting things out. The only interesting part is making
sure that the contention case gets the wakeups, and the above counts on:

- if a writer is waiting for readers (contention_wr), then the writer
will have already set the high bit, and a reader will know to wake it
up because the rw-semaphore value will be negative when it does
read_up().

- if a reader is waiting for a writer, then the reader will have
incremented the semaphore, and the writer will know to wake it up
becasue the semaphore value won't be zero after the "write_up()".

- if a writer is waiting for another writer (contention_ww case), it will
have to increment the "reader" part of the semaphore value, in order to
get the other writer to wake it up on "write_up()".

All other races should be trivially handled by just having the spinlock,
so the only really hard cases are the fast-path stuff where we cannot get
the semaphore because it is too expensive.

Does anybody see any holes in the above pseudo-implementation? Please take
a look at the way the current x86 semaphores are implemented: they use
exactly the above kinds of single-atomic-instruction-plus-condition-codes
trickery to get the non-contention case without _any_ extra instructions.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/