knfsd support for CROSSMNT

Neil Brown (neilb@cse.unsw.edu.au)
Thu, 23 Sep 1999 15:44:51 +1000 (EST)


The knfsd sources mention an export option called "CROSSMNT" which is
unimplemented.
My best guess as to what this means is that it is allowed to cross
mount points. In any case, I have taken it to mean that, and
implemented it with the following patch.

To be able to use it you need to be able to set the CROSSMNT flag
which means a small change to mountd/exportfs (left as a exercise for
the moment).
You also need to make sure that all subordinate file systems are
in the kernel export table. With the current tools this means that
they have to be exported to explicit hosts (not wild cards or
netgroups). I hope to make it work more generally soonish.

My first question is: should we leave the flag called "crossmnt" or
should we change to the name that SGI/IRIX uses and call it "nohide"?

You may notice the interesting change in nfsxdr.c. This is the second
thing that I am after input on.
If a server exports several filesystems allowing CROSSMNT, then they
will appear to be one filesystem to the client. This means that they
will need to have distinct inode numbers.
This is particularly a problem for the Linux NFS client as Linux uses
the inode number to lookup inodes in the icache (it should really be
able to use the 32byte filehandle, but that's another conversation).

The approach that I have taken is to XOR the bottom 8 bits of the
device into the top 8 bits of the inum. For most current filesystems,
I suspect the top 8 bits are 0 anyway.
This wont guarantee uniqueness if two filesystems are on devices which
have the same minor number but different major numbers. e.g. an IDE
disc and a SCSI disc.

I feel that the closest to "right" is something like:
minor = MINOR(dev);
major = MAJOR(dev);
ndev = 0;
for (i=0; i<8 ; i++) {
ndev = (ndev<<2)
|(minor&1)
|((major&1)<<1);
minor =>> 1;
major =>> 1;
}
nfs_inum = inum^ (ndev<<16);

so that the bits in the dev which provide most differentiation are
xored with the bits in the inum which provide the least
differentiation.
This would really require caching ndev, probably in the svc_export
structure, which is why I didn't bother doing it yet.

Presumably something similar would need to be done for nfsv3, but it
seems to use 64bit inums, and I haven't looked into how the client
side maps these into linux's 32bit inums.

Is it worth going through these hoops to fiddle the inode number, or
should we change the NFS client so that it doesn't depend on them so
much? (how do other OSes NFS clients cope?)

NeilBrown

This patch is againt plain linux-2.2.12. It is independant of any
other current knfsd patches that I know of.
----------%<---------cut-here-for-patch--------->&------------
diff -cr fs/nfsd.orig/nfsxdr.c fs/nfsd/nfsxdr.c
*** fs/nfsd.orig/nfsxdr.c Thu Sep 23 14:58:44 1999
--- fs/nfsd/nfsxdr.c Thu Sep 23 14:57:45 1999
***************
*** 186,192 ****
*p++ = htonl((u32) inode->i_rdev);
*p++ = htonl((u32) inode->i_blocks);
*p++ = htonl((u32) inode->i_dev);
! *p++ = htonl((u32) inode->i_ino);
*p++ = htonl((u32) inode->i_atime);
*p++ = 0;
*p++ = htonl((u32) inode->i_mtime);
--- 186,192 ----
*p++ = htonl((u32) inode->i_rdev);
*p++ = htonl((u32) inode->i_blocks);
*p++ = htonl((u32) inode->i_dev);
! *p++ = htonl((u32) inode->i_ino ^ ( ((u32) inode->i_dev) << 24));
*p++ = htonl((u32) inode->i_atime);
*p++ = 0;
*p++ = htonl((u32) inode->i_mtime);
diff -cr fs/nfsd.orig/vfs.c fs/nfsd/vfs.c
*** fs/nfsd.orig/vfs.c Thu Sep 23 14:58:44 1999
--- fs/nfsd/vfs.c Thu Sep 23 14:57:45 1999
***************
*** 172,190 ****
dchild = lookup_dentry(name, dget(dparent), 0);
if (IS_ERR(dchild))
goto out_nfserr;
/*
* check if we have crossed a mount point ...
*/
if (dchild->d_sb != dparent->d_sb) {
struct dentry *tdentry;
! tdentry = dchild->d_covers;
! if (tdentry == dchild)
! goto out_dput;
! dput(dchild);
! dchild = dget(tdentry);
! if (dchild->d_sb != dparent->d_sb) {
! printk("nfsd_lookup: %s/%s crossed mount point!\n", dparent->d_name.name, dchild->d_name.name);
! goto out_dput;
}
}

--- 172,201 ----
dchild = lookup_dentry(name, dget(dparent), 0);
if (IS_ERR(dchild))
goto out_nfserr;
+
/*
* check if we have crossed a mount point ...
*/
if (dchild->d_sb != dparent->d_sb) {
struct dentry *tdentry;
! struct svc_export *exp2=NULL;
!
! if (EX_CROSSMNT(exp))
! exp2 = exp_get(rqstp->rq_client,
! dchild->d_inode->i_dev,
! dchild->d_inode->i_ino);
! if (exp2)
! exp = exp2;
! else {
! tdentry = dchild->d_covers;
! if (tdentry == dchild)
! goto out_dput;
! dput(dchild);
! dchild = dget(tdentry);
! if (dchild->d_sb != dparent->d_sb) {
! printk("nfsd_lookup: %s/%s crossed mount point!\n", dparent->d_name.name, dchild->d_name.name);
! goto out_dput;
! }
}
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/