negative NFS cookies: bad C library or bad kernel?

From: Kevin Buhr (buhr@stat.wisc.edu)
Date: Sat Dec 02 2000 - 23:49:16 EST


Trond:

Fiddling with the Crytographic File System the other day, I managed to
tickle a mysterious bug. When some directories grew large enough,
suddenly a chunk of files would half "disappear". "find" would list
them fine, but "ls" and "echo *" wouldn't.

After a bit of troubleshooting, I discovered that the CFS daemon
(which presents itself to the system as an NFS daemon) was using
small, big-endian cookies in its directory entries. These became
large positive and negative little-endian "d_off" values in the dirent
structs.

The C library (in glibc-2.1.3/sysdeps/unix/sysv/linux/getdents.c) does
some fancy, double-buffering footwork in getdents(2) to try to guess
how many bytes of kernel_dirents it needs to read into a temporary
buffer to fill the user-supplied buffer with user dirents (which have
an extra "d_type" field). When its heuristic screws up, it does an
lseek on the directory so the next getdents(2) will start with the
right directory entry:

      if ((char *) dp + new_reclen > buf + nbytes)
        {
          /* Our heuristic failed. We read too many entries. Reset
             the stream. `last_offset' contains the last known
             position. If it is zero this is the first record we are
             reading. In this case do a relative search. */
          if (last_offset == 0)
            __lseek (fd, -retval, SEEK_CUR);
          else
            __lseek (fd, last_offset, SEEK_SET);
          break;
        }

In my case, for "ls" and "bash", the "last_offset" happened to be a
negative little-endian cookie. The kernel's "default_lseek" returned
EINVAL, the error was ignored, and "ls" and "bash" were blissfully
unaware that a bunch of directory entries had been read into the
temporary buffer and forever lost. Since "find" used a different
buffer size, it happened to have a positive little-endian cookie for
"last_offset" and didn't exhibit the problem.

A fix was easy---after modifying CFS to convert its cookies to small,
little-endian numbers, everything worked fine.

However, who's to blame here? It can't be CFS---any four-byte cookie
should be valid, right?

Is the kernel NFS client code to blame? If it's going to be using
cookies as offsets, shouldn't we have an nfs_lseek that special-cases
directory lseeks (at least those using SEEK_SET) to take negative
offsets, so utilities and libraries don't need to be bigfile-aware
just to read directories? And what in the world can we do about bogus
code like the:

            __lseek (fd, -retval, SEEK_CUR);

that appears above? Shouldn't any non-SEEK_SET lseek on an NFS
directory fail with an error?

Any thoughts?

Thanks.

Kevin <buhr@stat.wisc.edu>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Dec 07 2000 - 21:00:09 EST