Re: knfsd support for CROSSMNT

Olaf Kirch (okir@monad.swb.de)
Thu, 23 Sep 1999 11:53:46 +0200


On Thu, Sep 23, 1999 at 03:44:51PM +1000, Neil Brown wrote:
> My best guess as to what this means is that it is allowed to cross
> mount points. In any case, I have taken it to mean that, and
> implemented it with the following patch.

Yes, that was my intention originally. I didn't implement it mostly
because the inode # munging is such a pain.

> My first question is: should we leave the flag called "crossmnt" or
> should we change to the name that SGI/IRIX uses and call it "nohide"?

I don't care :-)

> If a server exports several filesystems allowing CROSSMNT, then they
> will appear to be one filesystem to the client. This means that they
> will need to have distinct inode numbers.

The user space nfsd is facing the exact same problem, and it is
a continuing sore.

The approach unfsd has taken for ages was what you're suggesting as
`the right solution', i.e. bitwise reversing the device major minor,
and xoring them with the inode, so that their least significant bits
are xored with the inode's most significant ones, and vice versa.

This has worked quite well for a long time, but I'm seeing an increasing
number of inode clashes since large disks have become so cheap. 9 gig
seems to be a magic threshold: The default mke2fs ratio of inode/blocks
yields 2M inodes for an 8G disk, i.e. your inode numbers require 21 bits.
You're bound to get clashes this way.

The unfsd now supports an alternative approach that is slightly less
prone to clashes, which is to use a static mapping of devices to indices,
e.g.
0 /dev/sda1
1 /dev/hda
2 /dev/sdb3
...

and to create those pseudo inode numbers like this:

index 0-7 0ddd............................
index 8-15 10ddd...........................
index 16-27 110ddd..........................

etc, where ddd is the device index modulo 7, and the dots represent
the bits available for the inode. This allows one to have up to
8 disks with up to 256M inodes, guaranteed to not produce clashes.

Needless to say, this kind of indexing is A) a lousy hack, and B) hard
to do in the kernel.

> Presumably something similar would need to be done for nfsv3, but it
> seems to use 64bit inums, and I haven't looked into how the client
> side maps these into linux's 32bit inums.

Not at all. It wants either the lower or upper 32bits (in case of
a different endianness) to be zero.

> Is it worth going through these hoops to fiddle the inode number, or
> should we change the NFS client so that it doesn't depend on them so
> much?

One possible approach might be for the client to transparently create
a (hidden?) mount point when it notices that the server sends a different
device number in a LOOKUP response.

> (how do other OSes NFS clients cope?)

Most of them don't.

Olaf

-- 
Olaf Kirch         |  --- o --- Nous sommes du soleil we love when we play
okir@monad.swb.de  |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
okir@caldera.de    +-------------------- Why Not?! -----------------------
         UNIX, n.: Spanish manufacturer of fire extinguishers.            

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/