Re: 2.6.26.3 kernel - progressive slowdown over NFS

From: David Flynn
Date: Tue Dec 09 2008 - 11:45:20 EST


On 2008-09-10, Priyank Patel <pkpatel.lists@xxxxxxxxx> wrote:
> We have a simple python program which keeps running a C loop to lstat
> NFS mounted directories. We are seeing some weird behavior w.r.t. the
> run-time of this program on 2.6.26.3 kernel vs 2.6.24 kernel.
>
> The run-time of the following code increases over time on the 2.6.26.3
> kernel, whereas remains flat (as expected) on the 2.6.24 kernel.

I'm seeing a similar effect, and ran a benchmark pre and post reboot:

$ strace -T /usr/sbin/bonnie++ -d . -s 0 -f -n 1 >/tmp/bonnie-r44237-netslow 2>&1
...reboot...
$ strace -T /usr/sbin/bonnie++ -d . -s 0 -f -n 1 >/tmp/bonnie-r44237-netfast 2>&1

Graphs of the operations are avaliable:
http://davidf.woaf.net/nfsfail-2.6.26/

In particular http://davidf.woaf.net/nfsfail-2.6.26/r44237-stat.pdf

r44237-* was a machine with a 28 day uptime and
r44088-* is an identical machine with 14 day uptime.

The graphs show times as recorded by strace for each syscall (points), a
cumulative frequency plot is also drawn on the same graph (lines).
yellow-orange points/lines are before the reboot, purple afterwards.

The machines are part of a cluster with a r/o nfsroot and common debian
stock kernel:
Linux r44088 2.6.26-1-amd64 #1 SMP Sat Aug 2 11:15:08 GMT 2008 x86_64 GNU/Linux

Other nfs filesystems are also mounted; all nfs mounts are nfsv3 over udp.

Has this issue been identified or resolved in 2.6.27?

Extra logs can be provided if required.

..david

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/