Re: NFS regression? Odd delays and lockups accessing an NFS export.

From: Grant Coady
Date: Mon Aug 25 2008 - 21:07:28 EST


On Mon, 25 Aug 2008 18:11:12 -0400, Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:

>On Tue, 2008-08-26 at 06:23 +1000, Grant Coady wrote:
>> On Fri, 22 Aug 2008 14:56:53 -0700, Trond Myklebust <trond.myklebust@xxxxxxxxxx> wrote:
>>
>> >On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
>> >> I can ssh to the server fine. The same server also serves my NFS home
>> >> directory to the box I'm writing this from and I've not seen any trouble
>> >> with this box at all, it's a 2.6.18-xen box.
>> >
>> >OK... Are you able to reproduce the problem reliably?
>> >
>> >If so, can you provide me with a binary tcpdump or wireshark dump? If
>> >using tcpdump, then please use something like
>> >
>> > tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049
>> ^^^^^^^^--> typo?
>
>No. The intention was to record _all_ the info in the packet for
>analysis, not just random header info.

I booted 2.6.27-rc4-git4a and started the tcpdump trace as described in
last email, got an immediate stall which cleared after 10-20 seconds, on
running a small script exported from the NFS server as user:

~$ cat /home/common/bin/boot-report
#! /bin/bash
#
version=$(uname -r)

dmesg >/home/common/dmesg-$version

# 2006-12-15 no more filter .config
#grep = /boot/config-$version >/home/common/config-$version
cp /boot/config-$version /home/common


The script had not created the file on the server when the command prompt
returned, during the stall I also opened another client terminal and ran
'df', it stalled too, and both client terminals came back to prompt after
the 10-20s delay. Then I hit Ctrl-C on the tcpdump terminal to prepare
this report.

The trace is only 27 lines, deltree is the server, pooh64 the client:
grant@pooh64:~$ /usr/sbin/tcpdump -r /tmp/dump.out00
reading from file /tmp/dump.out00, link-type EN10MB (Ethernet)
10:42:14.061410 IP pooh64.mire.mine.nu.0 > deltree.mire.mine.nu.nfs: 0 null
10:42:14.062691 IP deltree.mire.mine.nu.nfsd > pooh64.mire.mine.nu.839: S 2746115414:2746115414(0) ack 3659136038 win 5840 <mss 1460,nop,nop,sackOK>
10:42:14.062708 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 1 win 5840
10:42:14.062750 IP pooh64.mire.mine.nu.1392480046 > deltree.mire.mine.nu.nfs: 112 getattr fh 0,1/218104576
10:42:14.062765 IP pooh64.mire.mine.nu.1409257262 > deltree.mire.mine.nu.nfs: 112 fsstat fh 0,1/218104576
10:42:14.062944 IP deltree.mire.mine.nu.nfsd > pooh64.mire.mine.nu.839: . ack 113 win 5840
10:42:14.062951 IP deltree.mire.mine.nu.nfsd > pooh64.mire.mine.nu.839: . ack 225 win 5840
10:42:14.063873 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1392480046: reply ok 116 getattr DIR 40755 ids 1000/10 sz 448
10:42:14.063881 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 117 win 5840
10:42:14.063890 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1409257262: reply ok 88 fsstat tbytes 8422359040 fbytes 6605647872 abytes 6605647872
10:42:14.063893 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 205 win 5840
10:42:14.064319 IP pooh64.mire.mine.nu.1426034478 > deltree.mire.mine.nu.nfs: 116 access fh 0,1/218104576 001f
10:42:14.064677 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1426034478: reply ok 124 access c 001f
10:42:14.064732 IP pooh64.mire.mine.nu.1442811694 > deltree.mire.mine.nu.nfs: 136 readdirplus fh 0,1/218104576 512 bytes @ 0
10:42:14.067193 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1442811694: reply ok 1460 readdirplus
10:42:14.067203 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 1789 win 8760
10:42:14.067208 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.841889326: reply ok 1324
10:42:14.067328 IP pooh64.mire.mine.nu.1459588910 > deltree.mire.mine.nu.nfs: 112 getattr fh Unknown/010000000003000D0200000000000024010000060003000D02000000FE6F0000
10:42:14.068572 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1459588910: reply ok 116 getattr DIR 40755 ids 1000/10 sz 448
10:42:14.068619 IP pooh64.mire.mine.nu.1476366126 > deltree.mire.mine.nu.nfs: 136 getattr fh Unknown/010000060003000D02000000676A000002000000000000000200000001000000
10:42:14.070615 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1476366126: reply ok 116 getattr DIR 40755 ids 1000/10 sz 360
10:42:14.070652 IP pooh64.mire.mine.nu.1493143342 > deltree.mire.mine.nu.nfs: 140 access fh Unknown/010000060003000D02000000676A000002000000000000000200000001000000 001f
10:42:14.071576 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1493143342: reply ok 124 access c 001f
10:42:14.071638 IP pooh64.mire.mine.nu.1509920558 > deltree.mire.mine.nu.nfs: 160 readdirplus fh Unknown/010000060003000D02000000676A000002000000000000000200000001000000 512 bytes @ 0
10:42:14.073382 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.1509920558: reply ok 1460 readdirplus
10:42:14.074096 IP deltree.mire.mine.nu.nfs > pooh64.mire.mine.nu.0: reply Unknown rpc response code=6 516
10:42:14.074102 IP pooh64.mire.mine.nu.839 > deltree.mire.mine.nu.nfsd: . ack 5445 win 17520

Sorry I forgot to do the netstat -t.

Attached the tcpdump file in case there's more info in it.

config and dmesg for client at:
http://bugsplatter.id.au/kernel/boxen/pooh64/config-2.6.27-rc4-git4a
http://bugsplatter.id.au/kernel/boxen/pooh64/dmesg-2.6.27-rc4-git4a

for server:
http://bugsplatter.id.au/kernel/boxen/deltree/config-2.6.24.7a.gz
http://bugsplatter.id.au/kernel/boxen/deltree/dmesg-2.6.24.7a.gz

Grant.

Attachment: dump.out00.gz
Description: pooh64-2.6.27-rc4-git4a