Re: Back in Production mode Again...

Mr. James W. Laferriere Network Engineer (babydr@nwrain.net)
Sat, 15 Nov 1997 14:05:44 -0800 (PST)


Hello Doug,

Thank you for providing this simple script.
I had at one time received the SIG11 response quite
regularly . I replaced the memory some time ago &
now the SIG?? doesn't show up very often, But the very
first grep run thru showed up a differance , so I guess
I still have a memory problem of some type.

Is there a chance that the bios memory timing may be off ?
Or is it, If they were off there'd be a lot more difficulties ?

Tia, JimL

PS: I don't have an ADAPTEC I use NCR's ... So it isn't that.

On Sat, 15 Nov 1997, Doug Ledford wrote:
> On 15-Nov-97 Larry McVoy wrote:
> >: My question is this: what -is- a high load average? Over most of the
> >: period of a year of running this system as a production fileserver, The
> >: 'average' load average is probably near .25; so just what -are-
> >reasonable
> >: loads on this system? (disk subsystems as described; dual P-166s, not
> >: overclocked; 32MB of crappy ram (we're cursed with the SIG-11 and so we
> >: compile off of this machine) on an older Tyan II Tomcat.
>
> First, before I get into the load average stuff. REPLACE YOUR RAM! I
> cannot stress this loadly enough. If you get GCC sig11 errors and have
> known bad RAM, then don't stop compiling on your machine, fix your machine.
> The compiles are merely one symptom of this problem. Another is that heavy
> disk usage on that 2940 SCSI controller can (and eventually *WILL*) lead to
> disk corruption and loss of data. The BusMastering design of the card and
> the driver will sometimes even find memory errors that GCC misses (mainly
> when both the CPU and the card are trying to access RAM at the same time,
> which is a higher load than GCC places on RAM by itself). A good test to
> prove my point to you is this:
>
> cd /usr/src
> tar xzf linux-2.0.29.tar.gz
> mv linux linux.orig
> for i in 1 2 3 4 5 6 7 8 9 10
> do
> tar xzf linux-2.0.29.tar.gz
> diff -U 2 -rN linux.orig linux
> rm -fr linux
> done
>
>
> If that little script creates any output on your screen, then you've just
> seen disk corruption caused by this faulty RAM.
>
> >"load average" in Unix, not just Linux, is a misnomer. All it means is
> >that that is the number of processes waiting (sleeping) in the kernel.
> >On some systems (I think, I'm hazy here) only processes sleeping in disk
> >wait are counted; on others I think it is all sleeping processes.
>
> Esssentially, any process with a state of R or D (as reported by ps) are
> counted in this number. Of course, D usually indicates that the program is
> waiting on some sort of disk activity. So, as you can imagine, if you have
> a lot of programs accessing the disk at the same time, your load average can
> get quite high. There's nothing wrong with that. I know people that have
> maintained load averages as high as 180 for 24 hours or more without
> problems, it just means your computer is outrunning your disks. Myself,
> I've maintained load averages as high as 120 for extended periods without
> problem.
>
>
> ----------------------------------
> E-Mail: Doug Ledford <dledford@dialnet.net>
> Date: 15-Nov-97
> Time: 13:33:52
> ----------------------------------

+-----------------------------------------------------------------------+
| James W. Laferriere - Network Engineer - babydr@nwrain.net |
| System Techniques - 25416 - 22nd S. - Kent, WA 98032 |
| Give me VMS -or- Give me Linux -but- only on AXP |
+-----------------------------------------------------------------------+
|-> Linux-Vax Port, Now in Progress !YAY! there's Progress To Report <-|
|-> Please See http://ucnet.canberra.edu.au/~mikal/vaxlinux/home.html <-|
|-> Maintainer: Michael Still mikal@blitzen.canberra.edu.au <-|
+-----------------------------------------------------------------------+