Re: GCC proposal for "@" asm constraint

From: Andrea Arcangeli (andrea@suse.de)
Date: Fri Sep 22 2000 - 11:37:11 EST


This patch fixes the spinlock problems in read_lock/write_lock and also some
alpha SMP race where clear_bit isn't enforcing a memory barrier, plus some
improvement in some place where we can check the waitqueue is not full before
entering wake_up. There's some minor alpha compile fix too.

On the alpha side I also removed the condition branches in set_bit/clear_bit
(this was running faster on my userspace testcase). Anyway I made that a
compile time #define/#undef decision so returning to the previous behaviour is
trivial. I removed the mb in the UP case in test_and_set_bit like functions.

It also avoids to use atomic operations for ext2 and minix that only needs
bitops that goes over sizeof(long). (previously suggested by Ingo)

set_bit/clear_bit doesn't clobber "memory" (they are __asm__ __volatile__
though) but they delcare the memory written as volatile so gcc should make sure
not to pass a copy of such memory but its real memory repository. Also the
other places previously using the "fool gcc" thing now instead declare the
memory as volatile (some of the gcc fool wasn't declaring the array as
volatile, for example the dummy_lock in rwlock.h). The atomic_t is now delcared
volatile also in UP (for irqs and preemptive UP kernel).

For the atomic_set_mask/atomic_clear_mask I'm not casting the pointer
to the memory as a pointer to volatile because "memory" is clobbered
and that should be enough to ensure gcc doesn't use a copy of the memory.

Sparc64 now implements a more finegrined wmb as alpha (and rmb in the same wmb
way).

br_read_lock puts an mb() after acquring its per cpu slot. Dave, I think you
could replace such mb() (currently it's an rmb() that looks wrong for sparc64
even with the current rmb implementation) with membar("#StoreLoad"), but
there's no common code API for such a finegriend memory barrier only between
store and loads (mb() is the most finegrined call at the moment).

        ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test9-pre5/spinlock-1

The patch is against test9-5 but it's been tested only based on test8-pre5 with
a night of cerberus load with 100Mbit network transfers in background on IA32
2-way SMP. The alpha side passed now 59 minutes of cerberus overload as well
without any apparent problem (load of 23 with over 300 mbyte in SWAP on a 2G ram
machine 2-way SMP with flood of input and output TCP at 100Mbit).

The asm generated by the "memory" clobber doesn't seems to be inferior to the
previous asm btw (I compared two kernel images). There was no one bug in the
previous asm comparing to the new one though (on IA32).

One thing I still like to change (not addressed in the patch) is the eax
constraint on the pointer to the rwlocks in the read_lock/write_lock where the
spinlock is not a builtin constant. I removed the constraint (using "r") and
that generated better asm. I think it's better to increase of some byte the
size of the slow path (doing the push of eax to save it on the stack while we
use it to pass the spinlock pointer to the slow path function) than to harm the
fast path. Taking only the builtin variant should be enough to fix it.

The only architectures audited are IA32 and alpha, others may needs that
changes as well.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:27 EST