Re: [patch] waitqueue optimization, 2.4.0-test7

From: Andrea Arcangeli (andrea@suse.de)
Date: Mon Aug 28 2000 - 19:38:57 EST

Next message: Alexander Viro: "Re: SCO: "thread creation is about a thousand times faster than on"
Previous message: Andrea Arcangeli: "Re: [patch] waitqueue optimization, 2.4.0-test7"
In reply to: Ingo Molnar: "Re: [patch] waitqueue optimization, 2.4.0-test7"
Next in thread: Ingo Molnar: "Re: [patch] waitqueue optimization, 2.4.0-test7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 28 Aug 2000, Ingo Molnar wrote:

>yep, these are the cases i'm worried about as well. I thought that in all
>cases where we use a waitqueue we do have some other synchronized object
>as well (which administers the event/object we are waiting for) so in 99%

Yes I see, often we do the wakeup within a critical section protected by a
spinlock for example. But that's not required, one case that I had in mind
that wasn't using a synchronization object was the UnlockPage and I'm
sure there are more of those sync-less cases in the 9 houndred of wakeup
in the kernel code 8).

>of the cases we dont rely on the synchronization of the wake_up() itself.
>Isnt UnlockPage() atomic already (and a read/write barrier) to keep the
>bits themselves consistent?

In the UnlockPage case we use clear_bit because of a detail: we want to
save a word for each struct page. If for example we'll add a new
page->locked integer field just for the locked information we could avoid
the clear_bit and do page->locked = 0 and then we would rely on the
spin_lock of wake_up even on IA32. (note I'm definitely not suggesting to
add page->locked, if something I suggest to move the referenced bit in an
page->age field also for other reasons, then we could enforce to never
change the page->flags unless we acquired the PG_locked bitflag because
shrink_mmap won't change page->flags from under us anymore)

That's what I meant when I pointed out that clear_bit is necessary for
things like shrink_mmap where we play in the same word with unrelated bits
at any time thus we have to be atomic to avoid to screwup the page->flags.

>we have a number of other places that rely on clear_bit() being atomic -

Note: clear_bit _have_ to be atomic (if it wouldn't need to be atomic then
we wouldn't be using the load locked store conditional construct).

_All_ places where we use clear_bit rely on clear_bit to be atomic
(otherwise we would be faster clearing the bit in C).

>if this isnt true anymore on Alpha then we could make it nonatomic on x86

clear_bit _is_ atomic on Alpha (and definitely also on Sparc64), but it's
not a memory barrier there.

On IA32 clear_bit happens to be a memory barrier too, but that's just a
side effect of how we have to implement atomic memory accesses on the IA32
architecture.

>as well - this would be a great micro-optimization eg. in the ext2fs code
>[and other places]. We attempted this in early 2.3 but it was an obviously
>broken thing.

It would still be wrong (we would end corrupting page->flags).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alexander Viro: "Re: SCO: "thread creation is about a thousand times faster than on"
Previous message: Andrea Arcangeli: "Re: [patch] waitqueue optimization, 2.4.0-test7"
In reply to: Ingo Molnar: "Re: [patch] waitqueue optimization, 2.4.0-test7"
Next in thread: Ingo Molnar: "Re: [patch] waitqueue optimization, 2.4.0-test7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:22 EST