Re: [PATCH] rw_semaphores, optimisations

From: D.W.Howells (dhowells@astarte.free-online.co.uk)
Date: Sun Apr 22 2001 - 17:52:29 EST


Hello Andrea,

Interesting benchmarks... did you compile the test programs with "make
SCHED=yes" by any chance? Also what other software are you running?

The reason I ask is that running a full blown KDE setup running in the
background, I get the following numbers on the rwsem-ro test (XADD optimised
kernel):

    SCHED: 4615646, 4530769, 4534453 and 4628365
    no SCHED: 6311620, 6312776, 6327772 and 6325508

Also quite stable as you can see.

> (ah and btw the machine is a 2-way PII 450mhz).

Your numbers were "4274607" and "4280280" for this kernel and test This I
find a little suprising. I'd expect them to be about 10% higher than I get on
my machine given your faster CPUs.

What compiler are you using? I'm using the following:

   Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
   gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-80)

Something else that I noticed: Playing a music CD appears to improve the
benchmarks all round:-) Must be some interrupt effect of some sort, or maybe
they just like the music...

> rwsem-2.4.4-pre6 + my new generic rwsem (fast path in C inlined)

Linus wants out of line generic code only, I believe. Hence why I made my
generic code out of line.

I have noticed one glaring potential slowdown in my generic code's down
functions. I've got the following in _both_ fastpaths!:

    struct task_struct *tsk = current;

It shouldn't hurt _too_ much (its only reg->reg anyway), but it will have an
effect. I'll have to move it and post another patch tomorrow.

I've also been comparing the assembly from the two generic spinlock
implementations (having out-of-lined yours in what I think is the you'd have
done it). I've noticed a number of things:

  (1) My fastpaths have slightly fewer instructions in them

  (2) gcc-2.96-20000731 produces what looks like much less efficient code
      than gcc-snapshot-20010409 (to be expected, I suppose).

  (3) Both compilers do insane things to registers (like in one instruction
      moving %eax to %edx and then moving it back again in the next).

  (4) If _any_ inline assembly is used, the compiler grabs extra chunks of
      stack which it does not then use. It will then pop these into registers
      under some circumstances. It'll also save random registers it doesn't
      clobber under others.

(Basically, I had a lot of frustrating fun playing around with the spinlock
asm constraints trying to avoid the compiler clobbering registers
unnecessarily because of them).

I've attached the source file I've been playing with and an example
disassembly dump for your amusement. I used the snapshot gcc to do this (it
emits conditional chunks of code out of line more intelligently than the
older one.

It's also interesting that your generic out-of-line semaphores are faster
given the fact that you muck around with EFLAGS and CLI/STI, and I don't.
Maybe I'm getting hit by an interrupt. I'll have to play around with it and
benchmark it again.

David





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Apr 23 2001 - 21:00:43 EST