Mysterious 2.0.33 lockups

Joe Konopka (jkonopka@itol.com)
Wed, 18 Feb 1998 13:41:12 -0600 (CST)


I just had a solid lockup with 2.0.33 that matches the description of what
others seem to be encountering. Circumstances:

- Nearly stock 2.0.33 kernel tree, only changes are Solar Designer's
non-exec stack patch, and version 0.83 of the tulip.c driver. Contents
of .config, dmesg output, etc are available if anyone wants them.

- Hardware: SMC 10/100 ethernet card, tulip.c version 0.83 (in use)
Generic PCI NE2000 using stock driver (ifconfig'd down)
ABIT IT5H motherboard, 1 CPU AMD K5, 64mb RAM
Diamond PCI VGA - text mode only, no X or anything fancy.

- Relevant kernel options:
I keep most of my infrequently used stuff, like sound, cdrom,
etc, as modules and load them only when needed. The tulip
driver is built as a module, and was the only one loaded at
the time of the freeze. Triton DMA support is enabled and
was active on the drive that would've been most active at the
time. Kernel was compiled with gcc 2.7.2.3, libc 5.4.38.

The machine had been running fine, with an uptime around 13 days. I don't
usually run nfs on this machine, but I brought it up today to move about
700 megs from another machine. NFS server is unfsd v2.2beta29.

I mounted the machine's nfs export from another linux box (2.0.30),
started the copy and left it unattended for awhile. Upon returning, the
2.0.33 box was locked up solid -- no ctrl-alt-del, no switching VCs, no
network response at all. I have console blanking disabled, the console
was frozen as I left it, no errors, oopses or anything printed. Nothing
in the logs either.

I don't know if this is a "nfs thing" or not. Incidentally, I had a
wsimilar experience before with tulip.c version 0.86 -- it ran fine for a
week, I loaded the nfs server daemons and started a copy and within a few
minutes got an "Aiee -- killing interrupt handler" in the tulip driver's
handler which blew away network support, I rmmod'd and modprobed the
driver and it started working again, but I rebooted right afterward to
clean up. I was having other problems with 0.86 as well (random Aiee's
after a week or so of uptime) so I fell back to 0.83 and all was stable
until now.

Incidentally, we have another 2.0.33 server, same version of unfsd,
moderately different hardware, which has been up 30 some days under very
heavy NFS load with no problems. That machine is a K6-200 and currently
is using a Linksys PCI NE2000 clone (Winbond chip). It's also got Solar
Designer's patch, and is entirely SCSI (2940UW/aic78xx). The one that
locked up is all IDE.

I've been following the 'mysterious 2.0.33 lockup' thread and just thought
I'd send this report to get some more info on the table and maybe help to
figure out what's going on. If anyone needs more info, or wants patches
tested, please contact me.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu