Re: slowdown 2.0.35 vs 2.1.121 (pre 1)

Claude Gamache (cgamache@cae.ca)
10 Sep 1998 11:10:33 -0400


Hi Simon

> built 2.1.121 while running 2.0.34 and then built it again (after doing make
> clean; make xconfig; make dep) on top of the 2.1.121 previously built
>
> there was a 6% slowdown 9:14 vs 8:51
> is this to be expected?

We observed similar performance loss with kernels 2.1.xxx
I also saw a few posts regarding slowdown of the Xserver. The bottom
line is not only gcc or the Xserver are slowing down, but all the
processes are slowing down. The problem (performance loss or if you
wish, the slowdown) occurs after the system has been heavilty loaded
(building the kernel is an example of an heavy load for the system)

> i know the test was a bit rough and ready
> is 6% statistically significant?

We observed worse performance loss under certain conditions. So I
believe that your measurement is quite significative.

> system is PC164LX-533/2M cache/256M running off a Quantum Fireball SE4.3A
>
> had to turn auto dma off to get 2.1.121 to boot
> and also had to fiddle with ide-probe.c and strnicmp to get it to compile
> --
> simon

In order to demonstrate and measure the system performance loos, I
wrote a little test code since we don't time (or benchmark) usual
applications such as emacs, gcc and so on in our everyday work.

The program is silly, it just tries to do computations in order to eat
CPU cycles and see how the system behaves when only one instance of
the program is ran, and also when many instances are ran. The first
time you run the program you will get "normal" results. Once the
system is "broken" (altered), the program timings and overruns will
increase !! This shows that something was altered in the system and
you need to reboot to get the system optimum again ! But even if you
don't run the test code, as soon as you load heavily your system, you
will end up with general performance loss (slowdown). The problem
subtil

The test procedure is done in the following way:

1- Boot your system with your standard configuration (according to
distribution). In our case, we disable xdm and we don't launch the
X server. It would be a good idea to disable any program that play
with the system time such as xntpd for the duration of the tests.

2- In a virtual console, we launch 1 instance of prog1.

3- We measure the time prog1 takes to complete 1500 iterations a few
times. It should take 15.0 seconds without any overruns. (you have
to adjust MAX_K according to your computer, see the comments in the
code).

4- In a second virtual console, we launch 10 other (or more, there is
no magic with the number 10 here) instances of prog1 while the
first one is still running (with a dual PII 400 MHz, we launch in
between 50 to 100 other instances of prog1). The idea is to launch
as many other instances as required to heavily load the system.

5- When all instances are running, in the first console, we can
observe that prog1 takes more time to complete its 1500 iterations
and makes overruns, so far, the behavior is correct. On kernel
2.1.119 the CPU load is well balanced (all prog1 instances
take about the same CPU percentage, according to top).

6- Then, we kill all prog1 instances started in the second console.

7- So only 1 instance of prog1 remains in the first console and we
observe that it takes the same amount of time (15.0 seconds) to
complete 1500 iterations without any overruns. Again, everything is
normal so far.

8- Now, we kill the last instance of prog1 and we launch a new
instance of it. And then, we observe that prog1 takes more time to
complete and make overruns !!! We can kill it, and restart it,
again and again, but now, it is always slower than it was
previously (more than 15.0 seconds and makes overruns). This is not
logic at all, the process prog1 should take the same amount of
time, but this is not the case !!

You do not need to follow exactly this procedure to observe the
performance loss. You just simply launch 1 instance of prog1 when you
just have rebooted your system, observe the timing, and then launch a
large number of them. Then kill them all and launch one more instance
of prog1: now prog1 should take more time to complete and make
overruns.

The Performance loss is computed as follow:

Current_time - Initial_time
perf_loss = --------------------------- x 100
Initial_time

and Initial_time should be 15.0 seconds.

With dual Pentium II 400 MHz (512 kb cache) we observed a performance
loss varying from 3% to 10%. In some cases, the performance loss can
go up to 50%, but that does not happen often. We also reproduced the
problem with a single PPro 200 MHZ and dual PPro 200 MHz.

When running the test, make sure you don't have any daemon that plays
with the system time such as xntpd because this might alter your
results.

If you try this with your systems, do you get similar results ?
Are we doing something wrong with the timer/interrupt in prog1 ?
Any comments, suggestions and/or questions are warmly welcomed.

Please keep me informed of your results, thank you for your time and
your help.

Kind regards,

Claude

Here is the source code of prog1:

/*
* Simple Test Program: this is some sort of a benchmark program.
*
* (c) 1998 Claude Gamache <cgamache@cae.ca>, Claude Lefrancois <lefranco@cae.ca>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/

/*
* Compile with gcc -Wall -O9 -o prog1 prog1.c -lm
*/

#include <errno.h>
#include <math.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <unistd.h>

/* prototypes */
int set_interrupt( float period ); /* period is in seconds */
void interrupt_routine ( void ); /* computations statistics */
int timeval_subtract (struct timeval *result, struct timeval *x, struct timeval *y);
void interrupt_kill_routine ( void );
void set_kill_handler( void ); /* handle kill signal */

#define VERSION 1.0.1
#define MAX_ITERATIONS 150 /* 1500 if timer set to 100 ms, 1500 if timer set to 10 ms */
#define VECTOR_SIZE 8192
#define RATE ((float) 0.100) /* interrupt time in seconds */
#define VECNUM 6
#define MAX_K 120

/*
With a dual PPro 200 MHz 256 kb cache, the following combinations can
be tried:

VECNUM = 6
MAX_K = 120
RATE = 0.1

The idea is to keep to computation load high,

load = VECNUM * MAX_K * (1/RATE) = 6 * 120 * 10 = 7200

So if you change the defines, make sure the "load" remains to approx
7200 or more. With a clean system, prog1 must complete its
MAX_ITERATIONS in 15.0 seconds with (almost always) 0 overruns.

Also, in order to compare the results, keep MAX_ITERATION * RATE = 15 seconds

Set VECNUM to the following values:
-----------------------------------
(6 with dual PPro 200 MHz 256 kb cache)
(12 with dual Pentium II 400 MHz 512 kb cache)
(6 with single Pentium MMX 233 MHz)

Set MAX_K to the following values:
----------------------------------
(12 with dual PPro 200 MHz 256 kb cache)
(8 with dual Pentium II 400 MHz 512 kb cache)
(120 with single Pentium MMX 233 MHz and MAX_ITERATIONS = 150)

*/

/*
* if the program would be doing something useful,
* these labels would end up in a shared memory
*/
static struct sigaction old_action;
long sm_running,
sm_prog1_iterations,
sm_prog1_interrupts,
sm_prog1_is_running,
sm_prog1_overruns,
sm_prog1_overrun_has_happenned,
sm_vector_available[VECNUM];

float sm_global_vector[VECNUM][VECTOR_SIZE];

int main( int argc, char **argv )
{
long i, j;
volatile long k;
float local_vector[VECNUM][VECTOR_SIZE];
float elapsed_time = 0.0;
static long bench_fpass = 1;
static struct timeval tp1, safe_tp1, tp2, result;

/* ================================================== */
/* FIRST PASS INITIALISATION */
/* ================================================== */
if ( bench_fpass ) {

printf("init\n");
sm_prog1_interrupts = 0;
sm_prog1_iterations = 0;
sm_running = 1;
bench_fpass = 0;

if ( gettimeofday (&tp1, NULL) == -1 ) {

printf("\nprog1.c: main(): Error - invalid time !\n");
exit(0);
}
safe_tp1 = tp1;
}


/* connect_shm(); */ /* (eventually, if necessary a shared memory will be used) */

set_interrupt( RATE ); /* 10 ms - can be changed */
set_kill_handler(); /* catch SIGINT and SIGTERM */

/* ================================================== */
/* MAIN LOOP */
/* ================================================== */
while ( sm_running ) {

/*
1. count the number of iterations
2. generate new random vector
3. copy vectors to shared memory (eventually, if necessary)
4. wait next interrupt
5. if the code has been running for MAX_ITERATIONS, then print stats and reset counters
*/

sm_prog1_is_running = 1; /* indicate that the prog1 computation loop is active */
sm_prog1_iterations ++;

/* Adjust as required with your CPU power to make the program overrun a bit */
for (k=0; k<MAX_K; k++) {

/* fill the local_vector with random numbers and play with it */
for (i=0 ; i<VECNUM ; i++) {

if ( sm_vector_available[i] ) {

for ( j=0; j<VECTOR_SIZE; j++) {

/* un-efficient random vector generation */
local_vector[i][j] = rand()/RAND_MAX;

/* do some computations in order to eat some CPU cycles ... */
if (local_vector[i][j] < 0.0)
local_vector[i][j] *= -1.0;

local_vector[i][j] *= sin(local_vector[i][j]*M_PI) + log10(local_vector[i][j]+2.0*rand()/RAND_MAX);
local_vector[i][j] *= cos(local_vector[i][j]*M_PI+2.0*rand()/RAND_MAX) + log(local_vector[i][j]+2.0*rand()/RAND_MAX);
}
}
}
/* Results would be transfer back to shared memory then other processes can use them */
memcpy (sm_global_vector, local_vector, sizeof(sm_global_vector));
}

/* check if enough iterations were done */
if ( sm_prog1_iterations >= MAX_ITERATIONS ) {

tp1 = safe_tp1;

if ( gettimeofday (&tp2, NULL) == -1 ) {

printf("\nprog1.c: main(): Error - invalid time !\n");
exit(0);
}

safe_tp1 = tp2; /* latch current time for next iteration */
timeval_subtract (&result, &tp2, &tp1);
elapsed_time = result.tv_sec + 1e-6 * result.tv_usec;
printf("%ld iterations in %f seconds with %ld overruns\n", sm_prog1_iterations, elapsed_time, sm_prog1_overruns );
sm_prog1_iterations = 0;
sm_prog1_overruns = 0;
}

sm_prog1_is_running = 0; /* indicate that the prog1 computation loop is completed */
if (!sm_prog1_overrun_has_happenned)
pause(); /* wait for next interrupt, then the while loop continues */
else
sm_prog1_overrun_has_happenned = 0;
}

return 0;
}

/* ================================================== */
/* Copied from glibc info page */
/* */
/* Subtract the `struct timeval' values X and Y,
storing the result in RESULT.
Return 1 if the difference is negative, otherwise 0.
*/
int timeval_subtract (result, x, y)
struct timeval *result, *x, *y;
{
/* Perform the carry for the later subtraction by updating Y. */
if (x->tv_usec < y->tv_usec) {
int nsec = (y->tv_usec - x->tv_usec) / 1000000 + 1;
y->tv_usec -= 1000000 * nsec;
y->tv_sec += nsec;
}
if (x->tv_usec - y->tv_usec > 1000000) {
int nsec = (y->tv_usec - x->tv_usec) / 1000000;
y->tv_usec += 1000000 * nsec;
y->tv_sec -= nsec;
}

/* Compute the time remaining to wait.
`tv_usec' is certainly positive. */
result->tv_sec = x->tv_sec - y->tv_sec;
result->tv_usec = x->tv_usec - y->tv_usec;

/* Return 1 if result is negative. */
return x->tv_sec < y->tv_sec;
}

int set_interrupt( float period )
{
static struct sigaction action;
static struct itimerval timer;

/*

`sighandler_t sa_handler'
This is used in the same way as the ACTION argument to the
`signal' function. The value can be `SIG_DFL', `SIG_IGN', or
a function pointer. *Note Basic Signal Handling::.

`sigset_t sa_mask'
This specifies a set of signals to be blocked while the
handler runs. Blocking is explained in *Note Blocking for
Handler::. Note that the signal that was delivered is
automatically blocked by default before its handler is
started; this is true regardless of the value in `sa_mask'.
If you want that signal not to be blocked within its handler,
you must write code in the handler to unblock it.

`int sa_flags'
This specifies various flags which can affect the behavior of
the signal. These are described in more detail in *Note
Flags for Sigaction::.

*/

action.sa_handler = (void (*)(int))interrupt_routine;
timer.it_interval.tv_usec = (long int) (period*1000000.0);
timer.it_interval.tv_sec = 0;
timer.it_value.tv_usec = (long int) (period*1000000.0);
timer.it_value.tv_sec = 0;

if ( sigaction (SIGALRM, &action, NULL ) < 0 ) {

perror("Error sigaction()");
return -1;
}

if ( setitimer ( ITIMER_REAL , &timer, NULL ) < 0 ) {

perror ("Error: setitimer()");
return -1;
}
else
return 0;

/*
The `setitimer' function sets the timer specified by WHICH
according to NEW. The WHICH argument can have a value of
`ITIMER_REAL', `ITIMER_VIRTUAL', or `ITIMER_PROF'.

If OLD is not a null pointer, `setitimer' returns information
about any previous unexpired timer of the same kind in the
structure it points to.

The return value is `0' on success and `-1' on failure. The
following `errno' error conditions are defined for this function:

`EINVAL'
The timer interval was too large.
*/
}

void interrupt_routine ( void )
{
sm_prog1_interrupts++;

if (sm_prog1_is_running) {

/*
prog1 computations were not completed before
the next interrupt occured, log this as an overrun
*/
sm_prog1_overruns++;
sm_prog1_overrun_has_happenned = 1;
}

return;
}


void set_kill_handler( void )
{
static struct sigaction kill_action;

kill_action.sa_handler = (void (*)(int))interrupt_kill_routine;

if ( sigaction (SIGINT, &kill_action, NULL ) < 0 ) {

perror("Error sigaction() SIGINT");
exit(0);
}

if ( sigaction (SIGTERM, &kill_action, NULL ) < 0 ) {

perror("Error sigaction() TERM");
exit(0);
}

return;
}

void interrupt_kill_routine ( void )
{
static struct itimerval timer;

timer.it_interval.tv_usec = 0;
timer.it_interval.tv_sec = 0;
timer.it_value.tv_usec = 0;
timer.it_value.tv_sec = 0;

if ( setitimer ( ITIMER_REAL , &timer, NULL ) < 0 ) {

perror ("Error: disabling timer");
}

if ( sigaction (SIGALRM, &old_action, NULL ) < 0 ) {

perror("Error disabling timer handler");
}

fprintf(stdout, "Bench terminated.\n");

exit(0);
}

-- 
  Claude Gamache, CAE Electronique Ltee, 8585 Cote-de-Liesse  
  Saint-Laurent,  Quebec, Canada H4T 1G6                        
  Email: cgamache@cae.ca  Tel.: (514) 341-2000 x3194

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/faq.html