Re: Pentium II optimization (clc vs testl)

Manfred Spraul (manfreds@colorfullife.com)
Sun, 12 Sep 1999 17:11:22 +0000


This is a multi-part message in MIME format.
--------------EBACE1E5F08E80AB004C2589
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Shannon wrote:
> I got curious and decided to run Mingo's test code.
>
> On my dual PPro machine, I get 23 cycles all the time with clc.
> However, testl gives me 17, 26, and 85.
I think 24 instructions is far to short for an accurate benchmark.
Could you try my program?

> Is testl affected by SMP?
No, [except spurious bus traffic, this could cause the random increase]
I try to synchronize with the timer interrupt by calling "sleep(1)", so
my numbers are usually identical between all runs.

mingo, I've rerun the test, and the results make it obvious that such
tests are nearly useless: IIRC, the PPro/PII has 1 full instruction
decoder and 2 "auxiliary" decoders [they can only handle single-my-op
instructions]. The pairing capability depends on the exact instruction
stream.

300 * clc: ~ 300 ticks
150 *(clc;dummy instruction): ~ 180 ticks
100 *(clc;dummy;dummy): ~ 152 ticks

300 * testl: ~ 150 ticks
150 * (testl;dummy) ~ 150 ticks
100 * (testl;dummy;dummy) ~ 200 ticks.

--> as you can see, sometimes clc is faster, sometimes testl

Btw, I'm curious: Is someone out there with an Atlon?

--
	Manfred
--------------EBACE1E5F08E80AB004C2589
Content-Type: text/plain; charset=us-ascii;
 name="timetest.cpp"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="timetest.cpp"

/* * timetest.cpp: RDTSC based performance tester. * * Copyright (C) 1999 by Manfred Spraul. * * Redistribution of this file is permitted under the terms of the GNU * Public License (GPL) * $Header: /pub/cvs/ms/timetest/timetest.cpp,v 1.5 1999/09/12 17:10:12 manfreds Exp $ */

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h>

char sample[4096];

// Intel recommends that a serializing instruction // should be called before and after rdtsc. // CPUID is a serializing instruction. #define read_rdtsc(time) \ __asm__ __volatile__( \ "cpuid\n\t" \ "rdtsc\n\t" \ "mov %%eax,(%0)\n\t" \ "mov %%edx,4(%0)\n\t" \ "cpuid\n\t" \ : /* no output */ \ : "S"(&time) \ : "eax", "ebx", "ecx", "edx", "memory") #define BUILD_TESTFNC(name, text, instructions) \ static void name(int show) \ { \ unsigned long long time; \ unsigned long long time2; \ \ read_rdtsc(time); \ instructions; \ read_rdtsc(time2); \ if(show) \ printf("total time for " text ": %Ld ticks.\n", \ time2-time); \ } \

#define DO_50(x) \ x; x; x; x; x; x; x; x; \ x; x; x; x; x; x; x; x; \ x; x; x; x; x; x; x; x; \ x; x; x; x; x; x; x; x; \ x; x; x; x; x; x; x; x; \ \ x; x; x; x; x; x; x; x; \ x; x

#define DO_100(x) do { \ DO_50(x); \ DO_50(x);} while(0)

#define DO_300(x) do { \ DO_50(x); \ DO_50(x); \ DO_50(x); \ DO_50(x); \ DO_50(x); \ DO_50(x);} while(0)

#define DO_150(x) do { \ DO_100(x); \ DO_50(x);} while(0)

//////////////////////////////////////////////////////////////////////////////

BUILD_TESTFNC(zerotest,"zerotest", (void)0)

#define CLC __asm__ __volatile__ ("clc\n\t" : : : "memory") #define TESTL __asm__ __volatile__ ("testl %%eax, %%eax \n\t" : : : "eax", "memory") #define DUMMY __asm__ __volatile__ ("movl %%edx, %%ecx \n\t" : : : "ecx", "edx", "memory") #define DUMMY2 __asm__ __volatile__ ("movl %%esi, %%edi \n\t" : : : "esi", "edi", "memory")

BUILD_TESTFNC(mingotest1, "clc-test", DO_300(CLC)); BUILD_TESTFNC(mingotest2, "testl-test", DO_300(TESTL)); BUILD_TESTFNC(mingotest3, "clc+dummy-test", DO_150(CLC;DUMMY)); BUILD_TESTFNC(mingotest4, "testl+dummy-test", DO_150(TESTL;DUMMY)); BUILD_TESTFNC(mingotest5, "clc+2*dummy-test", DO_100(CLC;DUMMY;DUMMY2)); BUILD_TESTFNC(mingotest6, "testl+2*dummy-test", DO_100(TESTL;DUMMY;DUMMY2));

int main() {

if(geteuid() == 0) { int res = nice(-20); if(res < 0) { perror("nice(-20)"); return 1; } printf("TIMETEST, reniced to (-20).\n"); } else { printf("TIMETEST called by non-superuser, running with normal priority.\n"); } sleep(1); zerotest(0); zerotest(1); sleep(1); zerotest(0); zerotest(1); sleep(1); zerotest(0); zerotest(1);

sleep(1); mingotest1(0); mingotest1(1); sleep(1); mingotest1(0); mingotest1(1); sleep(1); mingotest1(0); mingotest1(1);

sleep(1); mingotest2(0); mingotest2(1); sleep(1); mingotest2(0); mingotest2(1); sleep(1); mingotest2(0); mingotest2(1);

sleep(1); mingotest3(0); mingotest3(1); sleep(1); mingotest3(0); mingotest3(1); sleep(1); mingotest3(0); mingotest3(1);

sleep(1); mingotest4(0); mingotest4(1); sleep(1); mingotest4(0); mingotest4(1); sleep(1); mingotest4(0); mingotest4(1);

sleep(1); mingotest5(0); mingotest5(1); sleep(1); mingotest5(0); mingotest5(1); sleep(1); mingotest5(0); mingotest5(1);

sleep(1); mingotest6(0); mingotest6(1); sleep(1); mingotest6(0); mingotest6(1); sleep(1); mingotest6(0); mingotest6(1);

return 0; }

--------------EBACE1E5F08E80AB004C2589--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/