NFS client performance 1.5 orders of magnitude too slow?

Godmar Back (gback@cs.utah.edu)
Tue, 9 Mar 1999 00:44:49 -0700 (MST)


Hi,

I have heard that the Linux NFS client doesn't have the best reputation.
However, I really couldn't believe the following results I got.
The following program writes 8,192 bytes, 1 byte at a time.

#include <fcntl.h>

void main(int ac, char *av[])
{
int fd = open(av[1], O_WRONLY|O_CREAT, 0777);
int i;

for (i = 0; i < 8192; i++) {
write(fd, "0", 1);
}
close(fd);
}

>From two machines with an identical LAN network connection to a SunOS 5.6
server, the program finishes instantaneously on a FreeBSD 3.0 machine:

time ./tw.f3 A1
0.008u 0.091s 0:00.10 90.0% 5+176k 0+1io 1pf+0w

On a Linux 2.2.3/RH 5.2 machine, however, it takes

time ./tw.linux B1
0.000u 0.840s 0:51.16 1.6% 0+0k 0+0io 64pf+0w

51 seconds!

Please tell me that I'm doing something wrong here.
Here's an excerpt from the tcpdump:

23:50:34.178183 peerless.cs.utah.edu.3079405568 > envy.cs.utah.edu.nfs: 172 writ
e [|nfs]
23:50:34.183063 envy.cs.utah.edu.nfs > peerless.cs.utah.edu.3079405568: reply ok
96 write [|nfs] (DF)
23:50:34.183136 peerless.cs.utah.edu.3096182784 > envy.cs.utah.edu.nfs: 152 geta
ttr [|nfs]
23:50:34.183647 envy.cs.utah.edu.nfs > peerless.cs.utah.edu.3096182784: reply ok
96 getattr [|nfs] (DF)
23:50:34.183719 peerless.cs.utah.edu.3112960000 > envy.cs.utah.edu.nfs: 172 writ
e [|nfs]

Linux issues 8,192 writes.
Here's a back of the envelope calculation for the above tcpdump excerpt:

4880 us between write req & write reply
73 us between write reply & getattr req
511 us between getattr req & getattr reply
72 us between getattr reply & next write req

Makes roughly 5536 us per request, which about adds up to 40-50 seconds
for 8192 requests.

So, I guess I have two questions:

+ Why does Linux do the getattr following the write?
+ Assuming the write takes so long (almost 5 ms) because the server does
synchronous writes to disk, then the issue is that Linux doesn't aggregate
write requests?

I assume this is what the README file for nfsserver2.2beta40
means when it says:

* NFS operation between two Linux boxes can be quite
slow. There are a number of reasons for this, only one
of which is that unfsd runs in user space. Another
(and much worse) problem is that the Linux NFS client
code currently does no proper caching, read-ahead and
write-behind of NFS data. This problem can be helped
by increasing the RPC transfer size on the client by
adding the `rsize=8192,wsize=8192' mount options. This
will at least improve throughput when reading or
writing large files. You are still in a lose-lose
situation when applications write data line by line or
with no output buffering at all.

Is anybody working on fixing this situation? I had heard rumors that
this had finally been fixed in 2.2.x, but apparently it hasn't.
This seems like a showstopper for Linux to me. This would be sad
since I rather use Linux than FreeBSD, but I may not be able to
afford the wait when I'm dealing with applications that are unaware
of this problem and don't do their own buffering.

- Godmar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/