jump/alignment considerations

Marc Lehmann (mlehmann@hildesheim.sgh-net.de)
Fri, 24 Jan 1997 12:59:04 +0100 (MET)

I stumbled over

* The default case (no contention) will result in NO
* jumps for both down() and up().

in semaphore.h

according to intel, a statically predicted forward branch
(always taken), costs 6 cylces, wheras a forward branch that is
predicted wrong (i.e., it is NOT taken) costs more than 12 cycles
on a pentium pro (which is actually true)

When loading modules (and maybe in the kernel), the jump
is often forwards, and it costs >12 cycles on a ppro (~10-40 instructions!)
its very costly...

on the other hand, doing an

jg over
jmp __down_failed

will increase cache footprint, but on the ppro this will give a high prefetch
advantage. IF the branch is dynamically predicted, both versions don't cost a full cycle
(when predicted correctly).


Now to the alignment issue (on the pentium, I have still no data for the ppro).

an alignment of zero doesn't do anything to the runtime, i.e. it neither adds
nor subtracts anything, as opposed to an alignment of 4 or 16 bytes.
(I checked dhrystone, gzip, nbench)

In this case, reducing the alignment will significntly reduce the size of the kernel
(and the cache footprint).

Since lea (%eax),%eax's are used to create 2-7 byte nops, removing any alignment
will significantly reduce AGI's, too, which is probably why the pentium is not getting slower
(it actually gets a slight advantage).

next time I will hopefully be able to support timing data for the ppro, to
justify 16 byte alignments for the ppro.

----==-- _
---==---(_)__ __ ____ __ Marc Lehmann
--==---/ / _ \/ // /\ \/ / mlehmann@hildesheim.sgh-net.de
-=====/_/_//_/\_,_/ /_/\_\ pcg@goof.com
The choice of a GNU generation