NFS causing OOPSes

Derek Glidden (dglidden@illusionary.com)
Fri, 15 Oct 1999 00:48:24 -0400


I have a machine (my file server) that has been generating crashes
pretty reliably while doing NFS stuff since I put RedHat 6.0 on it. It
was remarkably stable with RedHat 5.2, but I made the mistake of
upgrading the box because I thought that would be the easiest way to
make sure I could take advantage of kernel 2.2 so I could use the latest
RAID patches and figured a glibc 2.1 was going to be even more stable
than glibc 2.0 had been. (Silly me...)

RedHat 6.0 uses knfsd, which I got sick of watching crash so today I
removed knfsd and got and compiled 2.2.13pre17 (and applied the latest
RAID patches since I have a 36GB RAID on that box) without all the funky
stuff RedHat sticks in the kernel and the "old fashioned" nfs-server
2.2beta46 and thought "that's that" except that it's still crashing any
time there's any significant NFS traffic. Here's the latest error
message (which actually happened while I was writing this message...):

Unable to handle kernel paging request at virtual address 8b0c7536
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c0123bf5>]
EFLAGS: 00010282
eax: 8b0c7502 ebx: 00000000 ecx: c2b85de0 edx: c3fbe388
esi: c2b85de0 edi: 00000002 ebp: 00000008 esp: c3fe7fac
ds: 0018 es: 0018 ss: 0018
Process kupdate (pid: 3, process nr: 3, stackpage=c3fe7000)
Stack: 00000002 c012437d c2b85de0 c2b85480 000000e3 00000001 c0125285
c2b85de0
c3fe6000 c01db04b c3fe61c2 00001000 c2b85de0 c01256c8 00000f00
c3ff9fc0
c0106000 c010651f 00000000 00000f00 c0221fd8
Call Trace: [<c012437d>] [<c0125285>] [<c01db04b>] [<c01256c8>]
[<c0106000>] [<c010651f>]
Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d

(Typically, however, it's been 'knfsd' or 'rpc.nfsd' that's been the
process causing the problem. This time I see it's 'kupdate'.)

The machine is a P-200/MMX with 64MB of RAM and the only thing "unusual"
about the machine is that I have a 36GB RAID with three IDE hard drives
(Maxtor 91303D6) on a Promise UDMA-33 controller (PDC20246) and the md0
device is nearly full. I suspected that the errors might be caused by
memory problems so I've already swapped out the Mobo, including CPU and
RAM and the crashes have persisted. The only other cards in it are an
old ET4000 ISA VGA card, an Intel PCI EEPro 10/100 NIC running at
100Mbps to a 10/100 switch and a PCI Adaptec AIC-7850 driving the DAT on
the box.

Please someone help! This is really getting frustrating because I'm
rebooting and waiting for the huge RAID to fsck several times a day
now... It's essentially worthless as a file server because the dang
thing won't stay up long enough to move files on to/off of it!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
With Microsoft products, failure is not Derek Glidden
an option - it's a standard component. http://3dlinux.org/
Choose your life. Choose your http://www.tbcpc.org/
future. Choose Linux. http://www.illusionary.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/