SMP locks up when NFS times out

Lee Hetherington (ilh@lcs.mit.edu)
Mon, 19 Aug 1996 14:52:09 -0400


I've been having some real problems with 2.0.13 (and 2.0.11 and 2.0.12
for that matter) SMP locking up my machine. After much experimentation
I have determined that the problem occurs whenever an "NFS server foo
not responding" message would've happened. When running SMP the machine
locks up forever. When running the same kernel non-SMP the messages
show up and the machine does not lock up (it also starts doing NFS
normally once the connection is restored).

When I say locked up I mean that it won't respond to ping or finger from
another machine, it won't respond to the keyboard in any way (not even
ctrl-alt-delete or rshift-lock etc.). It is time for a hard reset.

This is a very serious problem. We're trying to get some dual P6-200
machines (DELL Optiplex GX Pros) up and running Linux, and this is a
real blow to us. I guess this is what it means to be running
"experimental" SMP.

I first noticed that the machine was crashing every night when our
fileserver got slow during incremental backups. Other machines would
show "NFS server not responding", but the Linux-SMP machines would lock
up for good.

I can now lock up the machine at will by doing the following:

1. start a big NFS read (10s of MB) with something like

% sum /big/nfs/file

2. wait 10s or so

3. pull the ethernet connection

As soon as you see your first "NFS server not responding" (at least I
think you see it, now I'm not sure) you're locked up for good.

**Could someone else running Linux SMP try out this lock up procedure?***

I have tried 2.0.13, 2.0.12, and 2.0.11 SMP with the same fatal lockups.

Any ideas? Please help! I'd love to help, but I don't know where to
begin.

Lee Hetherington
ilh@lcs.mit.edu