Re: Semaphore assembly-code bug

From: dean gaudet
Date: Mon Nov 01 2004 - 23:01:59 EST

On Mon, 1 Nov 2004, linux-os wrote:

> On Mon, 1 Nov 2004, dean gaudet wrote:
> > On Sun, 31 Oct 2004, linux-os wrote:
> >
> > > Timer overhead = 88 CPU clocks
> > > push 3, pop 3 = 12 CPU clocks
> > > push 3, pop 2 = 12 CPU clocks
> > > push 3, pop 1 = 12 CPU clocks
> > > push 3, pop none using ADD = 8 CPU clocks
> > > push 3, pop none using LEA = 8 CPU clocks
> > > push 3, pop into same register = 12 CPU clocks
> >
> > your microbenchmark makes assumptions about rdtsc which haven't been valid
> > since the days of the 486. rdtsc has serializing aspects and overhead that
> > you can't just eliminate by running it in a tight loop and subtracting out
> > that "overhead".
> >
> Wrong.

if you were correct then i should be able to measure 1 cycle differences
in sequences such as the following:

mov %eax,%edi
shr $1,%ecx

mov %eax,%edi
shr $1,%ecx
shr $1,%ecx
mov %eax,%edi
shr $1,%ecx
shr $1,%ecx
shr $1,%ecx
shr $1,%ecx
shr $1,%ecx
shr $1,%ecx
shr $1,%ecx
shr $1,%ecx

yet the attached program demonstrates that such measurements are
inaccurate. the results should be a sequence of numbers increasing
by 1 each time.

p4 model 2: 80 80 84 84 84 84 84 84
p4 model 3: 120 120 120 120 120 120 120 128
p-m model 9: 47 46 47 48 49 50 56 57
k8: 5 5 5 5 5 5 5 5


% gcc -O -o rdtsc-rounding rdtsc-rounding.c


#include <stdio.h>
#include <stdint.h>

#define template(n) \
static uint32_t foo##n(void) \
{ \
uint32_t start, done, trash1, trash2; \
__asm volatile( \
"\n rdtsc" \
"\n mov %%eax,%0" \
x##n("\n shr $1,%1") \
"\n rdtsc" \
: "=&r" (start), "=&r" (trash1), "=&a" (done), "=&d" (trash2) \
); \
return done - start; \

#define x1(x) x
#define x2(x) x x
#define x3(x) x x x
#define x4(x) x2(x) x2(x)
#define x5(x) x4(x) x
#define x6(x) x3(x2(x))
#define x7(x) x6(x) x
#define x8(x) x4(x2(x))


static uint32_t (*fn[9])(void) = {
0, foo1, foo2, foo3, foo4, foo5, foo6, foo7, foo8

static uint32_t bench(uint32_t (*f)(void))
uint32_t best;
unsigned i;

best = ~0;
for (i = 0; i < 100000; ++i) {
uint32_t cur = f();
if (cur < best) {
best = cur;
return best;

int main(int argc, char **argv)
unsigned i;

for (i = 1; i < sizeof(fn)/sizeof(fn[0]); ++i) {
printf("%u ", bench(fn[i]));
return 0;
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at