NFS

Dan Merillat (Dan.Merillat@ao.net)
Tue, 5 Mar 1996 19:25:07 -0500 (EST)


Ok, I have a problem with NFS... I.E. if the nfs server crashes, then
all linux boxes using have to be rebooted to use it again.

client# mount server:/usr/local/other /usr/local/other
client# cd /usr/local/other
client# ls
.....
server# Aiee! ... and dies
client# ls

and the client hangs forever on waiting for disk state.
(should timeout, at least)

Ok, server is back up, I killed the session on client (the shell, ls is
now effectivly toast)

client# mount server:/usr/local/other /usr/local/other
mount: server:/usr/local/other allready mounted or /usr/local/other busy
client# umount /usr/local/other
umount: server:/usr/local/art: device is busy
client# mount -o remount /usr/local/other
aha, sucess!
client# cd /usr/local/other
client# ls
and it hangs.
server _IS_ up.

1) Umount of NFS should allways work, regardless of in use or not.
2) Anything on NFS mounted partitons can get it dropped out from under them,
without hanging the machine (EIO or similar)
3) Remount of NFS partition should re-contact the server and really
re-establish the connection, instead of just returning sucess.

I know we have a better NFS client now, but unless this has been fixed in a
newer kernel (the client is a 1.2.10 + aic7xxx patches) it kinda makes
NFS worthless because it can take out a machine.

Also, general timeouts on waiting for disk, and processes waiting on disk
should be killable. (Zombies too...)

Perhaps a syscall - ret_ioerr(int process) would be in order?

How bad would it be to have a process waiting on disk activity to die?
I suppose you would need to put something there to catch the return from
the disk, but that would only eat a buffer -> /dev/null, so thats no biggie
(is it?)

Just some stability issues,
--Dan