Re: OT:fault tolerant nfs HOW?

Pavel Machek (pavel@bug.ucw.cz)
Wed, 7 Apr 1999 22:16:32 +0200


Hi!

> > "Normal" filehandles have slight chance to be valid on other computer,
> > but fake handles you get when re-exporting nfs will not be valid for
> > sure.
>
> What if the 'internal' share was Coda, not NFS? Because of the local cache,
> that'd improve performance, and would give some approximation of service
> even if the main server went down. Writes to the filesystem while the central
> server was down wouldn't be propagated between the two relay hosts, though,
> and may cause conflicts when they were eventually written back to the main
> server. I don't know how Coda handles that.

Hmm, if you are using coda, why not use coda for whole network? No
need to use NFS _at all_, then. Coda already has some features like
server replication etc.

> If the filesystems to be shared were read-only, you could just use software
> RAID and nbd, and mount them on all the server hosts. But that doesn't work
> when the filesystems are read-write.

Someone was working on solution like this with nbd and read-write. One
host was master, the second one was backup. When master failed,
backup took over, fsck-ed disk, mounted read-write and continued work.

> Perhaps what we need is a filesystem that can be written concurrently by many
> different hosts? Unfortunately, I don't think it's even possible to build the
> necessary synchronisation primitives with our current block device hardware.
>
> We could enforce block device access for these filesystems to be _only_ through
> nbd, and not allow direct access to disks which happen to be local, or on a
> shared SCSI bus. This would allow us to build in synchronisation primitives at
> the nbd layer. Would that be enough?

I don't know. But even if it was enough, it still looks like lot of
work :-).

> Alternatively, we can have such a cluster always elect a single host to be the
> 'writer', and ensure that the filesystem is always in a coherent state (cf.
> JFS). Read requests are served straight from the block device(s), and write
> requests are forwarded to the 'writer'. If the writer goes down, another
> election is held and a new writer is selected.

You can not have one writer multiple readers configuration: As readers
have their own caches (and no invalidation is done), it will not work.

Pavel

-- 
I'm really pavel@atrey.karlin.mff.cuni.cz. 	   Pavel
Look at http://atrey.karlin.mff.cuni.cz/~pavel/ ;-).

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/