Re: SMP locks up when NFS times out

Lee Hetherington (ilh@lcs.mit.edu)
Mon, 19 Aug 1996 20:07:56 -0400


Darrin R. Smith writes:
| It isn't a fix, but you could/should try adding the 'soft'
| option to the mount option list. With soft mounting of nfs volumes,
| Linux should retry the connection once and then give up gracefully if
| the remote machine doesn't respond.
| I use this with the machines at work(not on the Linux machine
| though since it doesn't have any nfs mounts), and it works well under
| AIX, Solaris, and SunOS.

Thanks. I am well aware of 'soft' mounting and find it very undesirable
in our environment. We typically have long-term jobs running around the
clock on many machines started by many different users. With 'hard'
mounts all of those jobs just suspend while we mess with the network or
the fileserver. When everything is back to normal all those jobs
suspended because of the NFS outage just start up as if nothing happened
to them. SunOS, Solaris, and OSF does just fine in this mode. I'm
hoping Linux can handle this too, or we're going to have a tough time
with Linux (and probably will have to go to Solaris x86 instead out of
necessity -- not my first choice). In our environment, it just doesn't
make sense to use a machine at all if the fileserver and net aren't up.

...

In a somewhat unrelated note, I have been able to lock up NFS only on a
UP machine running 2.0.13 (and 2.0.11) UP. It is an Intel Performance
AU with single P6-200, 2940 SCSI, and 3c590 ethernet. This locks up not
due to pulling the ethernet, but just conjestion on the ethernet. I
start getting "NFS server not responding messages", and then even after
the conjestion is gone I have no more NFS. Otherwise, I can still run
some things on the machine and the keyboard remains responsive. In
addition to the NFS not responding messages, I'm also seeing messages
about eth0 transmission time-outs. I'm not sure if this is NFS or 3c59x
related.

Lee Hetherington
ilh@lcs.mit.edu