Re: 2.2.10/12 kernel crashes

Nick Burrett (nick@dsvr.net)
22 Sep 1999 13:27:09 +0100


Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> > We're running linux 2.2.10 and linux 2.2.12 on various servers here.
> > The problem is that, from time to time, some of the servers crash.
>
> What hardware, what is common between the ones that crash
>
> > The biggest problem of all is that we do not know the reason for
> > the crash, the kernel produces no output, the screen is blank, no
> > keyboard, no network, no nothing.
>
> > I would like to find the answer to our problems, as these server
> > problems are not rare. But since these problems only occur on
> > live systems, what's the best way to go about trying to tackle
> > this ?
>
> #1. A serial console will grab a large crash dump.

Thanks. I'll stick one on and see what happens.

> Firstly though - is the box SMP ?

> Is it a socket 7 based board (eg K6), if so does turning off the BIOS APM
> and the USB support in the BIOS help.
> Do you see any other oddities - weird core dumps etc ?

There are two server types:

1. Cyrix 686/MX 333Mhz
Intel EtherexpressPro 10/100
Motherboard: AT Socket 7 TxPro chipset
1 Fujitsu MPB3064ATU hard disk
1 Maxtor 91080D5 hard disk

2. Intel Pentium II 400Mhz
ATX Gigabyte SMP Motherboard Intel 440BX AGP chipset
Intel EtherexpressPro 10/100
3 IBM 14Gb hard discs

There are nine servers in total, the system disks are built from a
single `build disk'. So the software is indentical. The servers
run virtual servers (anything upto 60 of them). Each virtual server
runs Apache 1.3.6, crond, syslogd and atd. There is no NFS support.
There is IP firewall support for the purposes of IP accounting only.
The SMP motherboards contain only 1 processor. No servers run X.

The box that crashed this morning was server type 2, running
kernel 2.2.10 but is not compiled for SMP support. Most have crashed,
with the same symptons, within the last 8 days (some built with an
SMP enabled kernel, some not). One server (of type 2) with a 2.2.10 SMP
kernel has been up 80 days.

The kernel's built with egcs 1.1.2. BIOS APM is turned off, but APM is
turned on in the kernel for CPU idle calls. However this problem has
been seen without APM support in the kernel. USB support is turned
off.

There is one oddity that occurs on all servers and that is to do with
syslogd. The servers run 60 virtual servers. Each virtual server runs
Apache 1.3.6, crond, atd and syslogd. However, syslogd tends to die
quite regularly. It blocks on a `select' syscall for most of the time (as
expected) but then receives a SIGTERM and dies.

To temporarily get around this problem, I masked out the SIGTERM signal
and syslogd now runs 99% of the time fine. However, it does still die
(without any errors) and occasionally `crond' also dies.

Regards,

Nick.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/