2.0.x and 2.1.x SMP performances behavior (UPDATED)

Claude Gamache (cgamache@cae.ca)
03 Sep 1998 15:08:41 -0400

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Vladimir Dergachev: "Re: AW: Bug in 2.0.36pre7 / X11"
Previous message: Conde Martinez Rodolfo: "Re: 2.1.110 and newer, SB AWE64"
In reply to: Linus Torvalds: "Re: 2.0.x SMP performances compared to 2.1.x"

Linus Torvalds <torvalds@transmeta.com> writes:

Sorry about the delay to respond to your email.

> It was my braindamage, pause() was fine, and obviously the correct thing
> to do. I was confusing you guys with some other people that just wanted to
> force a reschedule and keep going, and used sleep(0) rather than pause for
> that..

No problem, you deal with so much persons, it is easy to
understand.

> Which still leaves me with no clue on how to fix it, because I have no
> test-case. Can you make your program available to me (I'm really nervous
> about getting binaries over the internet, and you may be nervous about
> making sources available, but maybe we can do something where I promise to
> not show them to anybody else and delete them after I've used them?)
>
> Linus

Unfortunately, we cannot provide you the sources nor the binaries. But
instead, I wrote a little program (included at the end of this
message) that does exhibit a different behavior with kernel 2.0.35
(patch for procps 1.2.7 SMP applied) and kernel 2.1.119 (patch for
procps 1.2.7 SMP applied) and it is provided under the GPL, so you are
completely free to do whatever you want with it.

The program is silly, it just tries to do computations in order to eat
CPU cycles and see how the system behaves when only one instance is
ran, and also when many instances are ran.

The code is organized in this way:

set_interrupt_handler /* for SIGTERM and SIGINT signals */
set_timer /* 10 ms timer */

while ( 1 ) {

running = 1 /* to indicate the computations have started */

/*
main computations section
*/

running = 0 /* to indicate the computations have completed */

pause(); /* wait for next interrupt */
}

timer_handler()

timer_handler(): When called, if running == 1, it means that the
computations were not completed before the allowed period. So we log
this as an overrun and return. This is done in this way in order to
produce output periodically in "real time" (soft real time).

Perhaps you will need to change the number of loops done with the
label "k" (MAX_K) if your computer is very fast.

Our tests were done with the following computers/setup:

A- IBM Intellistation Z Pro
dual PPro 200 MHz (256 Kb cache)
128 Mb RAM
Red Hat 5.1, glibc 2, gcc 2.7.2.3
kernel 2.0.35 SMP and kernel 2.1.119 SMP

B- Dell Workstation 410
dual Pentium II 400 MHz (512 kb cache)
256 Mb RAM
Red Hat 5.1, glibc 2, gcc 2.7.2.3
kernel 2.0.35 SMP

C- Dell Workstation 410
dual Pentium II 400 MHz (512 kb cache)
256 Mb RAM
Slackware 3.5.0, glibc 5.4.44 (glibc 1), egcs 1.0.3
kernel 2.0.35 SMP

D- Dell Optiplex GXPro
single PPro 200 MHz (256 kb cache)
64 Mb RAM
Slackware 3.3, glibc 5.4.44, egcs 1.0.2
kernel 2.0.35 (non SMP, SMP=0 in makefile)

The test procedure was done in the following way:

1- Boot system with standard configuration (according to
distribution).

2- In a virtual console, we launch 1 instance of prog1.

3- We measure the time prog1 takes to complete 1500 iterations a few
times. It should take 15.0 seconds without any overruns. (you have
to adjust VECNUM and MAX_K according to your computer).

4- In a second virtual console, we launch 10 other instances of prog1
while the first one is still running (with a dual PII, we launch in
between 50 to 100 other instances of prog1).

5- When all instances are running, in the first console, we can
observe that prog1 takes more time to complete its 1500 iterations
and makes overruns, so far, the behavior is correct. On kernel
2.1.119 the CPU load is well balanced (all prog1 instances take
about the same CPU percentage, according to top).

6- Then, we kill all prog1 instances started in the second console.

7- So only 1 instance of prog1 remains in the first console and we
observe that it takes the same amount of time (15.0 seconds) to
complete 1500 iterations without any overruns. Again, everything is
normal so far.

8- Now, we kill the last instance of prog1 and we launch a new
instance of it. And then, we observe that prog1 takes more time to
complete and make overruns ! We can kill it and restart it again,
and again, but now it is always slower (more than 15.0 seconds and
makes overruns).

Here are the results of the tests on various computer / setup:

Computer/ Performance
setup loss

A(2.0.35 SMP) from ? and up to 30%
A(2.1.119 SMP) 3 to 50%

B(2.0.35 SMP) 3 to 10%

C(2.0.35 SMP) 3 to 10%

D(2.0.35) from ? and up to 50%

Where the Performance loss is computed as follow:

Current_time - Initial_time
perf_loss = --------------------------- x 100
Initial_time

and Initial_time should be 15.0 seconds.

When you launch only one instance of prog1, kill it, restart it and
repeat this over and over we don't get any performance loss at all. We
only oberve performance loss when the overall load is high.

If you thy this with your systems, do you get similar results ?
Are we doing something wrong with the timer/interrupt in prog1 ?

when running the test, make sure you don't have any daemon that plays
with the system time such as xntpd because this will alter your
results.

So we are a bit puzzled, perhaps we are doing something wrong, I don't
know. Any comments, suggestions and/or questions are warmly welcomed.

Thank you for your help and your time,

Claude

-- 
  Claude Gamache, CAE Electronique Ltee, 8585 Cote-de-Liesse  
  Saint-Laurent,  Quebec, Canada H4T 1G6                        
  Email: cgamache@cae.ca  Tel.: (514) 341-2000 x3194




/*
 *    Simple Test Program: this is some sort of a benchmark program.
 *    
 *    (c) 1998 Claude Gamache <cgamache@cae.ca>, Claude Lefrancois <lefranco@cae.ca>
 *    
 *    This program is free software; you can redistribute it and/or modify
 *    it under the terms of the GNU General Public License as published by
 *    the Free Software Foundation; either version 2 of the License, or
 *    (at your option) any later version.
 *    
 *    This program is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *    GNU General Public License for more details.
 */

/*
 *  Compile with  gcc -Wall -O9 -o prog1 prog1.c -lm
 */

#include <errno.h>
#include <math.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>

/* prototypes */
int  set_interrupt( float period );     /* period is in seconds  */
void interrupt_routine ( void );        /* computations statistics */
int  timeval_subtract (struct timeval *result, struct timeval *x, struct timeval *y);
void interrupt_kill_routine ( void );
void set_kill_handler( void );          /* handle kill signal */

#define VERSION 1.0
#define MAX_ITERATIONS 1500
#define VECTOR_SIZE    8192

/* Set VECNUM to the following values:            */
/* (6  with dual PPro 200 MHz 256 kb cache)       */
/* (12 with dual Pentium II 400 MHz 512 kb cache) */
#define VECNUM         12

/* Set MAX_K to the following values:             */
/* (12 with dual PPro 200 MHz 256 kb cache)       */
/* (8  with dual Pentium II 400 MHz 512 kb cache) */
#define MAX_K          8

/* 
 * if the program would be doing something useful,
 * these labels would end up in a shared memory 
 */
static struct sigaction old_action;
long sm_running,
  sm_prog1_iterations,
  sm_prog1_interrupts,
  sm_prog1_is_running,
  sm_prog1_overruns,
  sm_prog1_overrun_has_happenned,
  sm_vector_available[VECNUM];

float sm_global_vector[VECNUM][VECTOR_SIZE];

int main( int argc, char **argv )
{
  long          i, j;
  volatile long k;
  float         local_vector[VECNUM][VECTOR_SIZE];
  float         elapsed_time = 0.0;
  static long   bench_fpass = 1;
  static struct timeval tp1, safe_tp1, tp2, result;


  /* ================================================== */
  /*          FIRST PASS INITIALISATION                 */
  /* ================================================== */
  if ( bench_fpass ) {

    printf("init\n");
    sm_prog1_interrupts  = 0;
    sm_prog1_iterations  = 0;
    sm_running           = 1;
    bench_fpass          = 0;
    
    if ( gettimeofday (&tp1, NULL) == -1 ) {
      
      printf("\nprog1.c: main(): Error - invalid time !\n");
      exit(0);
    }
    safe_tp1 = tp1;
  }

  
  /*  connect_shm(); */ /* (eventually, if necessary a shared memory will be used) */
  
  set_interrupt( 0.010 );    /* 10 ms - can be changed   */
  set_kill_handler();        /* catch SIGINT and SIGTERM */

  /* ================================================== */
  /*                   MAIN LOOP                        */
  /* ================================================== */
  while ( sm_running ) {

    /*
      1. count the number of iterations
      2. generate new random vector
      3. copy vectors to shared memory (eventually, if necessary)
      4. wait next interrupt
      5. if the code has been running for MAX_ITERATIONS, then print stats and reset counters
    */


    sm_prog1_is_running = 1;   /* indicate that the prog1 computation loop is active */
    sm_prog1_iterations ++;


    /* Adjust as required with your CPU power to make the program overrun a bit */
    for (k=0; k<MAX_K; k++) {

      /* fill the local_vector with random numbers and play with it */
      for (i=0 ; i<VECNUM ; i++) {
      
	if ( sm_vector_available[i] ) {

	  for ( j=0; j<VECTOR_SIZE; j++) {

	    /* un-efficient random vector generation */
	    local_vector[i][j] = rand()/RAND_MAX;

	    /* do some computations in order to eat some CPU cycles ... */
	    if (local_vector[i][j] < 0.0)
	      local_vector[i][j] *= -1.0;
	    
	    local_vector[i][j] *= sin(local_vector[i][j]*M_PI) + log10(local_vector[i][j]+2.0*rand()/RAND_MAX);
	    local_vector[i][j] *= cos(local_vector[i][j]*M_PI+2.0*rand()/RAND_MAX) + log(local_vector[i][j]+2.0*rand()/RAND_MAX);
	  }
	}
      }
      /* Results would be transfer back to shared memory then other processes can use them */
      memcpy (sm_global_vector, local_vector, sizeof(sm_global_vector));
    }

    /* check if enough iterations were done */
    if ( sm_prog1_iterations >= MAX_ITERATIONS ) {

      tp1 = safe_tp1;
      
      if ( gettimeofday (&tp2, NULL) == -1 ) {
	
	printf("\nprog1.c: main(): Error - invalid time !\n");
	exit(0);
      }

      safe_tp1 = tp2;           /* latch current time for next iteration */
      timeval_subtract (&result, &tp2, &tp1);
      elapsed_time = result.tv_sec + 1e-6 * result.tv_usec;
      printf("%ld iterations in %f seconds with %ld overruns\n", sm_prog1_iterations, elapsed_time, sm_prog1_overruns );
      sm_prog1_iterations = 0;
      sm_prog1_overruns   = 0;
    }

    sm_prog1_is_running = 0;    /* indicate that the prog1 computation loop is completed */
    if (!sm_prog1_overrun_has_happenned)
      pause();                    /* wait for next interrupt, then the while loop continues */
    else
      sm_prog1_overrun_has_happenned = 0;
  }
  
  return 0;
}



/* ================================================== */
/* Copied from glibc info page                        */
/*                                                    */
/* Subtract the `struct timeval' values X and Y,
   storing the result in RESULT.
   Return 1 if the difference is negative, otherwise 0.  
*/
int timeval_subtract (result, x, y)
  struct timeval *result, *x, *y;
{
  /* Perform the carry for the later subtraction by updating Y. */
  if (x->tv_usec < y->tv_usec) {
    int nsec = (y->tv_usec - x->tv_usec) / 1000000 + 1;
    y->tv_usec -= 1000000 * nsec;
    y->tv_sec += nsec;
  }
  if (x->tv_usec - y->tv_usec > 1000000) {
    int nsec = (y->tv_usec - x->tv_usec) / 1000000;
    y->tv_usec += 1000000 * nsec;
    y->tv_sec -= nsec;
  }
     
  /* Compute the time remaining to wait.
     `tv_usec' is certainly positive. */
  result->tv_sec = x->tv_sec - y->tv_sec;
  result->tv_usec = x->tv_usec - y->tv_usec;
     
  /* Return 1 if result is negative. */
  return x->tv_sec < y->tv_sec;
}



int set_interrupt( float period )
{
  static struct sigaction action;
  static struct itimerval timer;

  /*

    `sighandler_t sa_handler'
          This is used in the same way as the ACTION argument to the
          `signal' function.  The value can be `SIG_DFL', `SIG_IGN', or
          a function pointer.  *Note Basic Signal Handling::.

    `sigset_t sa_mask'
          This specifies a set of signals to be blocked while the
          handler runs.  Blocking is explained in *Note Blocking for
          Handler::.  Note that the signal that was delivered is
          automatically blocked by default before its handler is
          started; this is true regardless of the value in `sa_mask'.
          If you want that signal not to be blocked within its handler,
          you must write code in the handler to unblock it.

    `int sa_flags'
          This specifies various flags which can affect the behavior of
          the signal.  These are described in more detail in *Note
          Flags for Sigaction::.

  */
  
  action.sa_handler         = (void (*)(int))interrupt_routine;
  timer.it_interval.tv_usec = (long int) (period*1000000.0);
  timer.it_interval.tv_sec  = 0;
  timer.it_value.tv_usec    = (long int) (period*1000000.0);
  timer.it_value.tv_sec     = 0;

  if ( sigaction (SIGALRM, &action, NULL ) < 0 ) {

    perror("Error sigaction()");
    return -1;
  }

  if ( setitimer ( ITIMER_REAL , &timer, NULL ) < 0 ) {

    perror ("Error: setitimer()");
    return -1;
  }
  else
    return 0;
    
    /*
    The `setitimer' function sets the timer specified by WHICH
     according to NEW.  The WHICH argument can have a value of
     `ITIMER_REAL', `ITIMER_VIRTUAL', or `ITIMER_PROF'.

     If OLD is not a null pointer, `setitimer' returns information
     about any previous unexpired timer of the same kind in the
     structure it points to.

     The return value is `0' on success and `-1' on failure.  The
     following `errno' error conditions are defined for this function:

    `EINVAL'
    The timer interval was too large.
    */
}

void interrupt_routine ( void )
{
  sm_prog1_interrupts++;

  if (sm_prog1_is_running) {

    /*
      prog1 computations were not completed before
      the next interrupt occured, log this as an overrun
    */
    sm_prog1_overruns++;
    sm_prog1_overrun_has_happenned = 1;
  }
  
  return;
}


 
void set_kill_handler( void )
{
  static struct sigaction kill_action;

  kill_action.sa_handler = (void (*)(int))interrupt_kill_routine;

  if ( sigaction (SIGINT, &kill_action, NULL ) < 0 ) {

    perror("Error sigaction() SIGINT");
    exit(0);
  }

  if ( sigaction (SIGTERM, &kill_action, NULL ) < 0 ) {

    perror("Error sigaction() TERM");
    exit(0);
  }

  return;
}


void interrupt_kill_routine ( void )
{
  static struct itimerval timer;

  timer.it_interval.tv_usec = 0;
  timer.it_interval.tv_sec  = 0;
  timer.it_value.tv_usec    = 0;
  timer.it_value.tv_sec     = 0;

  if ( setitimer ( ITIMER_REAL , &timer, NULL ) < 0 ) {

    perror ("Error: disabling timer");
  }

  if ( sigaction (SIGALRM, &old_action, NULL ) < 0 ) {

    perror("Error disabling timer handler");
  }

  fprintf(stdout, "Bench terminated.\n");

  exit(0);
}



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html

Next message: Vladimir Dergachev: "Re: AW: Bug in 2.0.36pre7 / X11"
Previous message: Conde Martinez Rodolfo: "Re: 2.1.110 and newer, SB AWE64"
In reply to: Linus Torvalds: "Re: 2.0.x SMP performances compared to 2.1.x"