* The default case (no contention) will result in NO
* jumps for both down() and up().
*/
in semaphore.h
according to intel, a statically predicted forward branch
(always taken), costs 6 cylces, wheras a forward branch that is
predicted wrong (i.e., it is NOT taken) costs more than 12 cycles
on a pentium pro (which is actually true)
When loading modules (and maybe in the kernel), the jump
is often forwards, and it costs >12 cycles on a ppro (~10-40 instructions!)
its very costly...
on the other hand, doing an
jg over
jmp __down_failed
over:
will increase cache footprint, but on the ppro this will give a high prefetch
advantage. IF the branch is dynamically predicted, both versions don't cost a full cycle
(when predicted correctly).
-----------------------------------------------------------------------------
Now to the alignment issue (on the pentium, I have still no data for the ppro).
an alignment of zero doesn't do anything to the runtime, i.e. it neither adds
nor subtracts anything, as opposed to an alignment of 4 or 16 bytes.
(I checked dhrystone, gzip, nbench)
In this case, reducing the alignment will significntly reduce the size of the kernel
(and the cache footprint).
Since lea (%eax),%eax's are used to create 2-7 byte nops, removing any alignment
will significantly reduce AGI's, too, which is probably why the pentium is not getting slower
(it actually gets a slight advantage).
next time I will hopefully be able to support timing data for the ppro, to
justify 16 byte alignments for the ppro.
-----==-
----==-- _
---==---(_)__ __ ____ __ Marc Lehmann
--==---/ / _ \/ // /\ \/ / mlehmann@hildesheim.sgh-net.de
-=====/_/_//_/\_,_/ /_/\_\ pcg@goof.com
The choice of a GNU generation