Re: OT:fault tolerant nfs HOW?

David Woodhouse (David.Woodhouse@mvhi.com)
Mon, 05 Apr 1999 16:25:39 +0100


e9525748@student.tuwien.ac.at said:
> we need a non-stop file server for linux computers. it should be linux
> based. if one node of the file server goes down, the other should
> quickly take over operations. it is vital to ensure that the files are
> kept synchronized all the time in normal operations; however, in time
> fo failure a small backlay may be tolerable...

I'm not sure how to do the synchronisation. Without it, you can still improve
reliability by keeping the machine with the filesystems on it completely
isolated from the outside world...

Put all your filesystems in a single machine on a private internal network.

Connect two servers to this internal network, and also to the outside network.
Mount your filesystems by NFS on these machines and re-export them to your
clients from there.

Set up DNS so that 'nfs-server.xxx.internal' points to both of your relay
machines, so about half of the clients will be using each one. Make them ping
each other on the internal network, and when one goes down, the other can take
over its public IP address and seamlessly continue service to the outside
world.

This still leaves you with a single point of failure, but because it's
completely isolated, it's less likely to fail. Not ideal, but perhaps useful.

Another problem with this is that you have to use the user-space nfsd on your
'relay' machines, unless you make individual NFS export entries for _every_
possible client. This is because knfsd is broken wrt network exports^W^W^W^W^W
won't start serving a client until it sees a mount request from that client,
and if the client has already sent its mount request to the other (now dead)
machine, the one that takes over will never authenticate it.

This scheme relies on the fact that NFS file handles from one machine will be
valid (or at least decipherable) on the other. I _think_ that they would be -
could anyone else confirm that?

---- ---- ----
David Woodhouse David.Woodhouse@mvhi.com Office: (+44) 1223 810302
Project Leader, Process Information Systems Mobile: (+44) 976 658355
Axiom (Cambridge) Ltd., Swaffham Bulbeck, Cambridge, CB5 0NA, UK.
finger dwmw2@ferret.lmh.ox.ac.uk for PGP key.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/