Re: NFS and pre-kernels.

From: Trond Myklebust (trond.myklebust@fys.uio.no)
Date: Wed Apr 19 2000 - 15:58:11 EST


>>>>> " " == Jim Wray <wray@neptune.cs.byu.edu> writes:

> I noticed that sometimes when I did copies from local to nfs,
> or from nfs to nfs, they didn't happen or they just never
> finished. On a couple of occasions they did however. It was
> because of those few successes I said I had not narrowed down
> the exact situations that caused problems.

Could you supply me with a tcpdump from both the server side and the
client side of a block read and/or write that fails? I suspect the
problem is your network quality: the new code supports wsize>4k, and
hence is more sensitive to missed fragments. In addition, it tries to
wait 5 seconds if the server refuses our connection in order to avoid
flooding the network (and also monopolizing the client itself) with
retries like 2.2.x did.

If your server supports NFS over TCP, you might want to try if that
doesn't improve matters. TCP can ask for resends of individual
fragments, and hence works much better on noisy networks.

> With listing, I noticed that the only time I had problems was
> with a directory that I copied while using one of the pre
> kernels. Then when typing ls, the machine would use all of the
> processor, and not ever do the listing. It was as if it had
> entered into infinite recursion. I suppose the conclusion is
> that there is no problem with ls, but that the copy was
> recursively writing in the directory. I never did check that
> thoroughly though.

The readdir algorithm was changed in 2.3.99-pre4. I hope the new one
works better. It is still known to have a problem which involves
hanging on certain servers (and the appended patch to cure the hangs
themselves has been sent to Linus).
That particular problem is known to affect connections to Netware, and
possibly some SGI servers.

Cheers,
  Trond

diff -u --recursive --new-file linux-2.3.99-pre5-nfs-1/fs/nfs/dir.c linux-2.3.99-pre5-nfs-2/fs/nfs/dir.c
--- linux-2.3.99-pre5-nfs-1/fs/nfs/dir.c Sat Apr 1 18:04:27 2000
+++ linux-2.3.99-pre5-nfs-2/fs/nfs/dir.c Sun Apr 16 22:51:09 2000
@@ -94,13 +94,15 @@
         if (!p)
                 return -EIO;
         for(;;) {
+ u64 cookie;
                 p = (u8*)decode((__u32*)p, entry, plus);
                 if (IS_ERR(p))
                         break;
                 pg_offset = p - start;
                 entry->prev = entry->offset;
                 entry->offset = base + pg_offset;
- if ((use_cookie ? entry->cookie : entry->offset) > offset)
+ cookie = use_cookie ? entry->cookie : (u64)entry->offset;
+ if (cookie > (u64)offset)
                         break;
                 if (loop_count++ > 200) {
                         loop_count = 0;
@@ -292,8 +294,9 @@
                 res = search_cached_dirent_pages(inode, off, dirent);
 
                 if (res >= 0) {
+ u64 cookie = use_cookie ? dirent->cookie : (u64)dirent->offset;
                         /* Cookie was found */
- if ((use_cookie?dirent->cookie:dirent->offset) > off) {
+ if (cookie > (u64)off) {
                                 *entry = *dirent;
                                 dirent->page = NULL;
                                 break;
@@ -325,6 +328,12 @@
                         plus = 0;
                         res = 0;
                 }
+ /* If we're using the page offset scheme, and the page cache
+ * is invalidated, we just restart reading at the next entry
+ * since the file position pointer is going to be screwed up
+ * anyway */
+ if (!use_cookie && off > dirent->offset)
+ off = dirent->offset;
         }
         if (dirent->page)
                 page_cache_release(dirent->page);
@@ -375,6 +384,12 @@
                 pg_offset = p - start;
                 entry->prev = entry->offset;
                 entry->offset = base + pg_offset;
+ /* Is the server violating our monotonicity assumption ? */
+ if (use_cookie && entry->prev_cookie > entry->cookie) {
+ use_cookie = 0;
+ NFS_SERVER(inode)->flags |= NFS_NONMONOTONE_COOKIES;
+ file->f_pos = 0;
+ }
                 if (loop_count++ > 200) {
                         loop_count = 0;
                         schedule();

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:16 EST