> code block A:
>
> ASM (" movw %ds, %ax; movw %es, %bx");
> ASM (" movw %ds, %ax; movw %es, %bx");
> ASM (" movw %ds, %ax; movw %es, %bx");
> ASM (" movw %ds, %ax; movw %es, %bx");
> ASM (" movw %ds, %ax; movw %es, %bx");
>
> executes in 19 cycles. block B:
>
> ASM (" nop; movl %ds, %ax; nop; nop; movl %es, %bx");
> ASM ("nop; nop; movl %ds, %ax; nop; nop; movl %es, %bx");
> ASM ("nop; nop; movl %ds, %ax; nop; nop; movl %es, %bx");
> ASM ("nop; nop; movl %ds, %ax; nop; nop; movl %es, %bx");
> ASM ("nop; nop; movl %ds, %ax; nop; nop; movl %es, %bx");
>
> executes in _20_ cycles only, although it has 19 more nop's in it, and has
> +25% more code length than block A...
>
> so, in case the test program is correct, there seems to be some nontrivial
> cost wrt. 'movw %ds,<reg>' on pentiums, but it's not a pairing issue. [and
> it's neither some sort of register collision issue, access to es,ds,ax,bx
> is interleaved properly.]
>
> the testcode has to be run as root, it should produce something like this:
>
> [root@hell asm]# ./test2
> best pushw latency: 19 cycles
> best pushl latency: 20 cycles
with a [true;] Pentium OverDrive PODP-83 I got
best pushw latency: 30 cycles
best pushl latency: 19 cycles
remember that this weird CPU was the reason why I started that bogomips thread
about alignment of the delay loop and (not) inlining it.
looks like this is yet another example of weird differences between
read Pentium and Pentium OverDrive...
Harald
-- All SCSI disks will from now on ___ _____ be required to send an email notice 0--,| /OOOOOOO\ 24 hours prior to complete hardware failure! <_/ / /OOOOOOOOOOO\ \ \/OOOOOOOOOOOOOOO\ \ OOOOOOOOOOOOOOOOO|// Harald Koenig, \/\/\/\/\/\/\/\/\/ Inst.f.Theoret.Astrophysik // / \\ \ koenig@tat.physik.uni-tuebingen.de ^^^^^ ^^^^^- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu