Re: Back in Production mode Again...

Doug Ledford (dledford@dialnet.net)
Sat, 15 Nov 1997 13:33:51 -0600 (CST)


On 15-Nov-97 Larry McVoy wrote:
>: My question is this: what -is- a high load average? Over most of the
>: period of a year of running this system as a production fileserver, The
>: 'average' load average is probably near .25; so just what -are-
>reasonable
>: loads on this system? (disk subsystems as described; dual P-166s, not
>: overclocked; 32MB of crappy ram (we're cursed with the SIG-11 and so we
>: compile off of this machine) on an older Tyan II Tomcat.

First, before I get into the load average stuff. REPLACE YOUR RAM! I
cannot stress this loadly enough. If you get GCC sig11 errors and have
known bad RAM, then don't stop compiling on your machine, fix your machine.
The compiles are merely one symptom of this problem. Another is that heavy
disk usage on that 2940 SCSI controller can (and eventually *WILL*) lead to
disk corruption and loss of data. The BusMastering design of the card and
the driver will sometimes even find memory errors that GCC misses (mainly
when both the CPU and the card are trying to access RAM at the same time,
which is a higher load than GCC places on RAM by itself). A good test to
prove my point to you is this:

cd /usr/src
tar xzf linux-2.0.29.tar.gz
mv linux linux.orig
for i in 1 2 3 4 5 6 7 8 9 10
do
tar xzf linux-2.0.29.tar.gz
diff -U 2 -rN linux.orig linux
rm -fr linux
done

If that little script creates any output on your screen, then you've just
seen disk corruption caused by this faulty RAM.

>"load average" in Unix, not just Linux, is a misnomer. All it means is
>that that is the number of processes waiting (sleeping) in the kernel.
>On some systems (I think, I'm hazy here) only processes sleeping in disk
>wait are counted; on others I think it is all sleeping processes.

Esssentially, any process with a state of R or D (as reported by ps) are
counted in this number. Of course, D usually indicates that the program is
waiting on some sort of disk activity. So, as you can imagine, if you have
a lot of programs accessing the disk at the same time, your load average can
get quite high. There's nothing wrong with that. I know people that have
maintained load averages as high as 180 for 24 hours or more without
problems, it just means your computer is outrunning your disks. Myself,
I've maintained load averages as high as 120 for extended periods without
problem.

----------------------------------
E-Mail: Doug Ledford <dledford@dialnet.net>
Date: 15-Nov-97
Time: 13:33:52
----------------------------------