Re: Pentium II optimization (clc vs testl)

Manfred Spraul (manfreds@colorfullife.com)
Fri, 10 Sep 1999 17:29:35 +0000


This is a multi-part message in MIME format.
--------------9B6BCFB2FA4FDAACC0360BF1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

mingo@chiara.csoma.elte.hu wrote:
> The BARRIER thing might looks curious, but rdtsc has to be shielded
> from the measured section, otherwise rdtsc's uops might mix up and
> interact with the measured section - causing false results.

IIRC, Intel recommends that a "CPUID" instruction should be used:
it's a guaranteed serializing instruction.

> Anyway, i previously measured the
> overhead of clc vs. testl %X, %X before posting, and the testl version
> performs better here - maybe you can explain why. I've attached the code,
> the OVERHEAD #define is hand-tailored (with empty measured section the
> result should be 0 cycles) to my box - this can be different on other
> boxes.

I used a similar program and I think the answer is simple:
clc do not pair with themself, but they do pair (sometimes?) with other
instructions.

but clc is slightly slower that testl.
I've attached my program.

--
	Manfred
--------------9B6BCFB2FA4FDAACC0360BF1
Content-Type: text/plain; charset=us-ascii;
 name="timetest.cpp"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="timetest.cpp"

/* * timetest.cpp: CPUID based performance tester. * * Copyright (C) 1999 by Manfred Spraul. * * Redistribution of this file is permitted under the terms of the GNU * Public License (GPL) * $Header: /pub/cvs/ms/timetest/timetest.cpp,v 1.2 1999/09/10 17:29:51 manfreds Exp $ */

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h>

char sample[4096];

// Intel recommends that a serializing instruction // should be called before and after rdtsc. // CPUID is a serializing instruction. #define read_rdtsc(time) \ __asm__ __volatile__( \ "cpuid\n\t" \ "rdtsc\n\t" \ "mov %%eax,(%0)\n\t" \ "mov %%edx,4(%0)\n\t" \ "cpuid\n\t" \ : /* no output */ \ : "S"(&time) \ : "eax", "ebx", "ecx", "edx", "memory")

static void zerotest() { unsigned long long time; unsigned long long time2;

read_rdtsc(time); read_rdtsc(time2); printf("total time for zerotest: %Ld ticks.\n", time2-time);

}

static void mingotest(int show) { unsigned long long time; unsigned long long time2;

read_rdtsc(time); #define CLC __asm__ __volatile__ ("clc\n\t" : : : "memory") #define TESTL __asm__ __volatile__ ("testl %%esi, %%esi \n\t" : : : "esi", "memory") #define DUMMY __asm__ __volatile__ ("movl %%esi, %%edi \n\t" : : : "esi", "edi", "memory")

// test 1: 200 CLC's: 199 ticks // #define INSTR CLC // test 2: 200 TESTL: 104 ticks #define INSTR TESTL // test 3: 200 CLC's and dummy instructions: 239 ticks // #define INSTR CLC; DUMMY // test 4: 200 TESTL's and dummy instructions: 200 ticks // #define INSTR TESTL; DUMMY

#define INSTR50 \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ \ INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; INSTR; \ INSTR; INSTR

INSTR50; INSTR50; INSTR50; INSTR50;

read_rdtsc(time2); if(show) printf("total time for 200 INSTR: %Ld ticks.\n", time2-time);

}

int main() {

if(geteuid() == 0) { int res = nice(-20); if(res < 0) { perror("nice(-20)"); return 1; } printf("MOVETEST, reniced to (-20).\n"); } else { printf("MOVETEST called by non-superuser, running with normal priority.\n"); } sleep(1); zerotest(); zerotest(); sleep(1); zerotest(); zerotest(); sleep(1); zerotest(); zerotest();

sleep(1); mingotest(0); mingotest(1); sleep(1); mingotest(0); mingotest(1); sleep(1); mingotest(0); mingotest(1); sleep(1); mingotest(0); mingotest(1); return 0; }

--------------9B6BCFB2FA4FDAACC0360BF1--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/