Re: NFS and kernel 2.6.x

From: Jamie Lokier
Date: Sat Apr 17 2004 - 22:30:45 EST

Next message: Nick Piggin: "Re: vmscan.c heuristic adjustment for smaller systems"
Previous message: Larry McVoy: "NFS exporting imports?"
In reply to: Trond Myklebust: "Re: NFS and kernel 2.6.x"
Next in thread: Trond Myklebust: "Re: NFS and kernel 2.6.x"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Trond Myklebust wrote:
> With this patch
> - the major timeout is of fixed length "timeo<<retrans", and the
> clock starts at the first attempt to send the packet.
> - If a major timeout occurs, we now reset the RTT estimator so
> as to "slow start" when the server becomes available again.
>
> For the moment it does use the timeo + retrans values, because the
> former is in fact wanted in order to initialize the RTT estimator.
> However, it no longer uses the count of the number of actual
> retransmissions in order to determine whether or not a major timeout
> occurred.

Ok, observations:

- The RTT converges to 0.1s on my LAN, just as it did before the patch.
Very sensible, and as you said the 100 microsecond problem is not
with us these days.

- The RTT is reset after a timeout (from 0.1-0.15s to 0.7s in my tests).
As expected.

- With the defaults (retrans=3, timeo=0.7s), I see:

After disconnecting the server, the client first times out after
about 5.5-6 seconds. First minor timeout is 0.1. This makes sense
as 0.7 << 3 == 5.6.

Subsequent timeouts take about 10.5 seconds. This also makes sense,
as you have set the timeout threshold at 0.7*8 == 5.6 seconds,
and three timeouts is 0.7*(1+2+4) == 4.9 seconds, too short.
Four timeouts is 0.7*(1+2+4+8) == 10.5 seconds.

The old behaviour before RTT estimation would have timed out
after 10.5 seconds, I think.

- With retrans=5, and timeo still has the default value of 0.7s:

After disconnecting the server, the minor timeout intervals are
approximately:

0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2 seconds.

Are they intended to stop doubling at 3.2? The major timeout
thus happens after 22.3 seconds.

Unsurprisingly, subsequent major timeouts take 44.1 seconds.

So this patch is a big improvment, and I'm going to keep using it for my home
directory with retrans=5,soft so it gets some more background testing.
(retrans=3 is too short even with the patch).

However, there are potential improvements. One is that the 3.2 above
should continue doubling. The other is that behaviour would be nicer
if the major timeout time was more predictable: 22.3 to 44.1 seconds
is a big range. This is easy with the algorithm described below.

It isn't possible to have remove the variation completely. However,
it can easily by reduced by changing the doubling strategy: keep
doubling the retransmit time, until it exceeds timeo. When that
happens, set the retransmit time to the next greater or equal value of
timeo << N for some integer N.

For example, with RTT at 0.1s, retrans=5, timeo=0.7, these would be
the minor timeout intervals:

0.1, 0.2, 0.4, 0.7, 1.4, 2.8, 5.6, 11.2, 22.4

leading to a total major timeout time of 44.8 seconds.

Subsequent major timeouts, with the RTT reset to 0.7s, would take 44.1
seconds: 0.7, 1.4, 2.8, 5.6, 11.2, 22.4.

If the RTT estimator is larger than timeo to start with, the first
retransmit will timeout after RTT, but subsequent ones will be a value
of timeo << N. E.g. if RTT was 2s, this would be the minor timeout
sequence: 2.0, 2.8, 5.6, 11.2, 22.4.

The algorithm for deciding when a major timeout occurs is different
too. Instead of keeping track of the total time since the very first
transmission, you simply deem the major timeout to occur after the
minor timeout of timeo << retrans occurs. I.e. in these examples, the
22.4s minor timeout is always the final one.

This reduces the possible variation, with these parameters, to the
range 44.1 to 45.325 seconds: much more consistent than 22.05 to 44.1
seconds.

As well as giving more consistent results, this might even be simpler
than the algorithm in your patch, because there is no need to remember
the total time since the first transmission.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Nick Piggin: "Re: vmscan.c heuristic adjustment for smaller systems"
Previous message: Larry McVoy: "NFS exporting imports?"
In reply to: Trond Myklebust: "Re: NFS and kernel 2.6.x"
Next in thread: Trond Myklebust: "Re: NFS and kernel 2.6.x"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]