HARD CRASH with kernels (2.2.4,12,14,15)

From: Steven Uggowitzer (uggowitzers@internet.who.int)
Date: Wed May 10 2000 - 08:15:01 EST


This problem occurs on our primary web server. Symptom is a completely
frozen machine with typically no error messages. Occurs randomly every 2hrs
to 3 days. I have seen similar reports on several newsgroups including this
listserv. My architecture is:

Compaq 1850R SMP
cpqarray SCSI
2 x 600MHz Pentium III
tlan ethernet card
RedHat 6.1 with kernel 2.2.15 SMP i686

I experience the problems with both SMP and non SMP kernels and machines.
Things that might help:

I run NFS client and server, autofs and NIS client on this PC, but I've had
it occur with all these things off and not compiled into the kernel.
Occasionally I find in my syslog:
May 9 16:43:19 kenny inetd[522]: pid 9741: exit status 255
which I can't account for.

I turn off the console blanking every time I boot.
Once when a crash occured (2.2.14 SMP) I observed the following:
<[c010adb1]><[c016959d]><[c011F259]><[c0177ddc]><[c01509da]>
wait_on_bh, CPU 0:
irq: 0[0 0]
bh: 1[0 1]

These 4 lines repeated 5 times, then the first line again, then frozen.

The machine is a very busy plain HTML server with almost no CGI scripts. CPU
load is minimal (<2%), however we average over 10 HTTP requests per second.
All HTML is on the local filesystem. We run many other servers with similar
kernels without problems. The difference being only the TCP session load. I
have tried completely changing the hardware and even used a completely
different vendor (HP Netserver LX SMP with Adaptec 2940s) -- still the same
problem. There are no visible IRQ conflicts. Changing the httpd version and
recompiling also didn't help.

It is driving me crazy. Previously I ran RedHat 5.1 with 2.2.4 running for
over a year without any need for reboot. Now even 2.2.4 is giving me this
problem. This server had seen an ever increasing TCP session load as this
web site has become more popular. IMHO it must be tied in with this some
how.

Please contact me if you need more details.

Steven Uggowitzer,uggowitzers@who.int
World Health Organization
Geneva, Switzerland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 15 2000 - 21:00:15 EST