patch: NFS and O_EXCL. Also: is O_EXCL really atomic?

Miquel van Smoorenburg (miquels@q.cistron.nl)
14 Sep 1996 16:59:21 +0200


Using a mail spool on a NFS mounted partition is not really safe, because
locking (fcntl() and flock()) don't work over NFS without a lockd. That's
why most systems also use dotlocking (mailbock.lock files).This works
if the MTA and MUA use link() to lock the mailbox, since link() is
guaranteed to be atomic, also over NFS.

However most systems use open(file, O_CREAT|O_WRONLY|O_EXCL, mode) to
creat a lockfile. And NFS doesn't know about O_EXCL, so this still doesn't
work. I created a patch for the Linux NFS client code that guarantees
an atomic creat by first creating a temporary file, and then linking it
to its destination. I've tested this with a Solaris server and a Linux client,
between multiple Linux clients and with Linux NFS servers. It appears to
work fine (until you use non-Linux NFS clients, ofcourse). What do others
think of this? The patch is at the end of this message.

This also brings up another question. In the Linux FS code, O_EXCL
is treated by the VFS layer by first testing if the file is present,
and then creating the file if not. It looks to me as if it could be
possible that another process could be scheduled between the test and
the create. This chance is really small, but still.. I don't know
enough of the VFS layer to see if this is really true, though.

If anyone is interested, I could post my "locktest" program that I used
to test the code. Here's the patch:

diff -ruN linux-2.0.20.orig/fs/namei.c linux-2.0.20/fs/namei.c
--- linux-2.0.20.orig/fs/namei.c Fri Sep 13 23:54:40 1996
+++ linux-2.0.20/fs/namei.c Sat Sep 14 00:56:14 1996
@@ -376,6 +376,11 @@
dir->i_count++; /* create eats the dir */
if (dir->i_sb && dir->i_sb->dq_op)
dir->i_sb->dq_op->initialize(dir, -1);
+ /*
+ * Put flag in *res_inode as a hint to the
+ * file system create code (esp. O_EXCL).
+ */
+ *res_inode = (struct inode *)flag;
error = dir->i_op->create(dir, basename, namelen, mode, res_inode);
up(&dir->i_sem);
iput(dir);
diff -ruN linux-2.0.20.orig/fs/nfs/dir.c linux-2.0.20/fs/nfs/dir.c
--- linux-2.0.20.orig/fs/nfs/dir.c Sat Jul 20 12:32:03 1996
+++ linux-2.0.20/fs/nfs/dir.c Sat Sep 14 13:04:54 1996
@@ -6,6 +6,8 @@
* nfs directory handling functions
*
* 10 Apr 1996 Added silly rename for unlink --okir
+ * 14-Sep-1996 Added atomic open (O_EXCL) through nfs_proc_link --miquels
+ *
*/

#include <linux/sched.h>
@@ -14,6 +16,7 @@
#include <linux/nfs_fs.h>
#include <linux/fcntl.h>
#include <linux/string.h>
+#include <linux/utsname.h>
#include <linux/kernel.h>
#include <linux/malloc.h>
#include <linux/mm.h>
@@ -397,7 +400,12 @@
struct nfs_sattr sattr;
struct nfs_fattr fattr;
struct nfs_fh fhandle;
+ char tmp[16];
int error;
+ int flags;
+ int i;
+
+ flags = (int)*result;

*result = NULL;
if (!dir || !S_ISDIR(dir->i_mode)) {
@@ -412,14 +420,52 @@
sattr.mode = mode;
sattr.uid = sattr.gid = sattr.size = (unsigned) -1;
sattr.atime.seconds = sattr.mtime.seconds = (unsigned) -1;
- if ((error = nfs_proc_create(NFS_SERVER(dir), NFS_FH(dir),
- name, &sattr, &fhandle, &fattr))) {
- iput(dir);
- return error;
- }
- if (!(*result = nfs_fhget(dir->i_sb, &fhandle, &fattr))) {
- iput(dir);
- return -EACCES;
+
+ /*
+ * NFS has no atomic creat-if-not-exists (O_EXCL) but we
+ * can emulate it by creating the file under a temporary
+ * name, and then trying to link it to its destination.
+ * This way file locking using O_EXCL works over NFS.
+ */
+ if (flags & O_EXCL) {
+ /*
+ * Try to make a unique temp name, network-wide.
+ */
+ strcpy(tmp, ".nfs");
+ for(i = 0; i < 5 && system_utsname.nodename[i] &&
+ system_utsname.nodename[i] != '.'; i++)
+ tmp[4 + i] = system_utsname.nodename[i];
+ sprintf(tmp + 4 + i, "%05d", current->pid);
+ if ((error = nfs_proc_create(NFS_SERVER(dir), NFS_FH(dir),
+ tmp, &sattr, &fhandle, &fattr))) {
+ iput(dir);
+ return error;
+ }
+ /*
+ * Link the temp file to the destination file.
+ * We have to get the temp file handle first.
+ */
+ if (!(*result = nfs_fhget(dir->i_sb, &fhandle, &fattr))) {
+ iput(dir);
+ return error;
+ }
+ error = nfs_proc_link(NFS_SERVER(dir), NFS_FH(*result),
+ NFS_FH(dir), name);
+ (void)nfs_proc_remove(NFS_SERVER(dir), NFS_FH(dir), tmp);
+ if (error < 0) {
+ iput(dir);
+ return error;
+ }
+ } else {
+ if ((error = nfs_proc_create(NFS_SERVER(dir), NFS_FH(dir),
+ name, &sattr, &fhandle, &fattr))) {
+ iput(dir);
+ return error;
+ }
+ if (!(*result = nfs_fhget(dir->i_sb, &fhandle, &fattr))) {
+ iput(dir);
+ return -EACCES;
+ }
}
nfs_lookup_cache_add(dir, name, &fhandle, &fattr);
iput(dir);

Mike.

-- 
   Miquel van      | Cistron Internet Services   --    Alphen aan den Rijn.
   Smoorenburg,    | mailto:info@cistron.nl          http://www.cistron.nl/
miquels@cistron.nl |           The truth is out there. 42.