Re: NFS probs w/ NetApp

From: Trond Myklebust (trond.myklebust@fys.uio.no)
Date: Thu Mar 16 2000 - 16:08:35 EST


>>>>> " " == Andrew Morton <morton@nortelnetworks.com> writes:

> We're having some problems talking NFS to a NetApp server
> (kernel 2.2.5 and 2.2.13). I'd appreciate it if an NFS guru
> could help out with some trace interpretation. I'm also
> wondering if there's a timeout handling buglet in the kernel.

> Mount options (autofs):

> /automounts/netapp /etc/auto.netapp
> -soft,rw,bg,intr,rsize=8192,wsize=8192,retry=5,retrans=130,timeo=14,noquota,nosuid

> tcpdump output for a large read (zwoln001 is the netapp):

> 18:14:38.420863 zwoln001 > pwold011: (frag 63753:1244@2960)
> 18:14:38.421020 pwold011.175244598 > zwoln001.nfs: 152 read
> [|nfs] 18:14:38.422770 zwoln001.nfs > pwold011.175244598: reply
> ok 1472 read [|nfs] (frag 64009:1480@0+) 18:14:38.424002
> zwoln001 > pwold011: (frag 64009:1480@1480+) 18:14:38.425041
> zwoln001 > pwold011: (frag 64009:1244@2960) 18:14:38.436889
> pwold011.192021814 > zwoln001.nfs: 152 read [|nfs]
> 18:14:38.438659 zwoln001.nfs > pwold011.192021814: reply ok
> 1472 read [|nfs] (frag 64265:1480@0+) 18:14:39.830059
> pwold011.192021814 > zwoln001.nfs: 152 read [|nfs]
>>>>> [ timeout ]
> 18:14:42.630092 pwold011.192021814 > zwoln001.nfs: 152 read
> [|nfs] 18:14:42.631824 zwoln001.nfs > pwold011.192021814: reply
> ok 1472 read [|nfs] (frag 64521:1480@0+) 18:14:42.633054
> zwoln001 > pwold011: (frag 64521:1480@1480+) 18:14:42.634094
> zwoln001 > pwold011: (frag 64521:1244@2960)

> ...

> 18:14:42.793453 zwoln001.nfs > pwold011.728892726: reply ok
> 1472 read [|nfs] (frag 7178:1480@0+) 18:14:42.794684 zwoln001 >
> pwold011: (frag 7178:1480@1480+) 18:14:42.795723 zwoln001 >
> pwold011: (frag 7178:1244@2960) 18:14:42.795877
> pwold011.745669942 > zwoln001.nfs: 152 read [|nfs]
> 18:14:42.797627 zwoln001.nfs > pwold011.745669942: reply ok
> 1472 read [|nfs] (frag 7434:1480@0+) 18:14:42.798858 zwoln001 >
> pwold011: (frag 7434:1480@1480+) 18:14:42.799898 zwoln001 >
> pwold011: (frag 7434:1244@2960) 18:14:42.801114
> pwold011.762447158 > zwoln001.nfs: 152 read [|nfs]
> 18:14:42.803039 zwoln001 > pwold011: (frag 7690:1480@1480+)
> 18:14:42.804079 zwoln001 > pwold011: (frag 7690:1244@2960)
>>>>> [another timeout]
> 18:14:44.200164 pwold011.762447158 > zwoln001.nfs: 152 read
> [|nfs] 18:14:44.201945 zwoln001.nfs > pwold011.762447158: reply
> ok 1472 read [|nfs] (frag 7946:1480@0+) 18:14:44.203175
> zwoln001 > pwold011: (frag 7946:1480@1480+) 18:14:44.204214
> zwoln001 > pwold011: (frag 7946:1244@2960) 18:14:44.205896
> pwold011.779224374 > zwoln001.nfs: 152 read [|nfs]

> I assume the first timeout is simply due to the read request
> getting lost - could be n/w or server probs. Correct?

Yup.

> But the second retry at 18:14:44.200164 looks odd because the
> read seems to have been satisfied.

Nope. The fragment with offset 0 has been lost. My guess is that
you're having problems with a noisy network.

> A couple of questions:

> - How does one enable the 'chatty' mode on the client so we get
> some syslog output?

echo xxxx >/proc/sys/sunrpc/yyy_debug

Where yyy can be nfs, nlm, nfsd or rpc.

and xxxx depends on what you wish to debug: see the values of
NFSDBG_FACILITY in the respective source files.

For instance, in order to debug the Sunrpc socket code in
net/sunrpc/xprt.c, you will want to

echo 1 >/proc/sys/sunrpc/rpc_debug

It is possible to 'or' values from different files.

> - Why are these retries only coming out after 2.8 seconds, when
> the 'timeo' option was set to 1.4 seconds?

I'm not sure. It's been too long since I looked at the stock 2.2.x RPC
code.

Cheers, and good luck,
   Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:19 EST