Re: TCP Stall

Eric.Schenk@dna.lth.se
Mon, 31 Mar 1997 12:49:46 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Eric.Schenk@dna.lth.se: "Re: TCP Stall"
Previous message: Mark Hemment: "Re: ext2 filesystem corruption?!?!??"
Maybe in reply to: Richard B. Johnson: "TCP Stall"
Next in thread: Eric.Schenk@dna.lth.se: "Re: TCP Stall"

"Richard B. Johnson" <root@analogic.com> writes:
> I have been looking into the TCP Stall problem when FTPing
> files between Linux machines via a PPP link. This problem
> also occurs with remotely mounted file-systems. I have
> reported this problem over 10 times during the past two
> years and I have recorded at least 25 other instances in
> which other users have reported the same problem.

I've been actively asking for people to report these problems to me
for about a year. I have not been able to get any reports that either
have actual traces of the problem so I can see what is happening,
nor have I been able to reproduce the problems. It's not that we don't
care about these problems, but we need better reporting that "I saw a
pause". We need traces, preferably from both ends of the link, and
even better taken with ethernet sniffing hardware rather than
with the boxes doing the sending and receiving. We also need reporters
to make some effort to be sure that the problem is not in a lower
communications layer. (I'm not pointing any fingers here, this is just
a general trend I've observed.) I've had lots of reports with traces
where I could eventually figure the problem out, and the problem usually
turned out to be either a problem in the networking configuration, or
running Solaris on the far end. (For those who don't know, if you are
running Solaris there are some serious TCP throughput problems over
slow links with Solaris. Sun has a patch set out to fix them.
Install these patches on your Solaris boxes, and encourange others
to do the same!)

> I dump all communications on specific problems that I have
> encountered into separate Pine "folders", so it's easy to
> maintain a history of a specific problem.

Great, can I please have a copy. I want to compare this against my
own archives on this problem.

> Apparently this problem is not considered important because
> absolutely nothing has been done about it for over two
> years. There have been no experimental patches from Network
> gurus attempting to fix this very real and very troublesome
> problem.

There have been patches, but again, unless I can reproduce the problem
or get good traces of it, it is hard to guess what is going on.
One thing, if you are just watching linux-kernel you will not see
much traffic about developments in the networking code. You need
to watch linux-net for that.

> The problem is that data being transferred between links
> that use megabit speeds and links that use kilo-bit speeds
> needs flow control.

I've seen this with the 2.1.x kernels recently. David and I know
about it, and we are working on it. If this is happening with the
pre-2.0.30 kernels after the next release I want to know about it.
The more tracing you can give me the better.

> Normally, I see the window set at 24,820 (right-hand edge).
> I don't know why. Perhaps someone determined that it was
> optimum. I observe that when the receive buffer gets full on
> the machine that is routing packets to my PPP link, the
> window abruptly goes to zero (0). This is okay, it means "I
> don't have any more room". It could have slowly closed, but
> it doesn't. When the window is zero, the machine attempting
> to send data to the router, stops sending data. This is
> correct. It is not allowed to send data when there is no
> room for it. It CAN send packets, however it MUST NOT send
> packets containing data.

Which kernel do you observe this against? It's an interesting data
point, but I need to know which code I have to look at.

> Now, how does the machine that received a window of zero
> know that buffers are available again? I watch the Sun send
> a SYN.

WHAT!? Are you sure? If it sends a SYN packet that is for a new connection.
If it arrives at the same port as the previous connection it should
elicit an RST. If it's not at the same port you are just seeing a new
connection. It should be sending zero window probes.
If you still have the traces on this one I want to see them.

> It receives an ACK with the new window. I don't know
> if this is the correct thing to do according to the RFCs,
> but it works. It is likely that the routing machine, i.e.,
> the one that has buffers loaded with data, trying to free
> them by getting the data squeezed into the PPP link, should
> be the machine to send a SYN when buffers are available
> again.

Not quite, when the buffers get drained it should be sending an ACK
with the new window. Not that this ACK can get lost, so the remote
side must be doing period zero window probes as well.

> RFC-1122 defines a standard way to "probe" for the new
> window after the window has shrunk to zero. This is shown in
> 4.2.2.17.
>
> This does not appear to happen with the Linux machines

Which kernel(s)? Also, the receiver does not probe, the sender
does. If the sender is failing to probe it is not our problem.

> although it is has been confirmed that "tcpdump" will
> randomly drop packets, and often the important ones for
> which you are watching.

How have you confirmed that tcpdump is loosing packets? [The way I have
confirmed this in the past is to write the equivalent to tcpdump into
the kernel and dump information to the system logs. Not pretty.
This lead to some bugs in the SOCK_PACKET implementation being fixed.
I have not seen tcpdump loose packets since. This was about a year ago.]

> The machine will stall for as much as 30 minutes until the
> sender re-sends an unsolicited data packet (Yes, a packet
> with data even though the window was closed). The packet is
> ACKed with the new window and normal data-flow restarts
> until the router's buffer is full again. This continues
> until the file has finally been sent.

The sending machine should have been sending zero window probes
throughout.

> The result is that a 1/2 megabyte file will take up to 2
> hours to be sent on a 56 kb link. Sun's "snoop" seems to be
> a lot better at looking for problems than "tcpdump". Tcpdump
> seems to lose a lot of packets. It also fails to interpret
> some of them.

Huh? If tcpdump is seeing garbage packets I would suspect the networking
hardware. Also, beware of assuming that the packets you see getting sent
on the Sun actually are making it to your Linux box in one piece.
Almost anything could happen to them before they hit the Linux box.

> When looking for network problems, beware of
> tcpdump. It is not a very good tool. Perhaps if its captured
> binary data were first written to a file, it would not lose
> so much information.

Read the tcpdump manual page, you can ask tcpdump to do this.

BTW, here's a small list of problems that can masquerade as TCP layer
problems:

(1) You are using USR Sportster modems without the pause bug fix.
On tcpdump traces this makes things look like you take continuous
long pauses in data being sent.

(2) The VJ compression layer at one end or the other of your PPP link
is broken. This results in lots of packets getting thrown out.
Turn of VJ compression to test this. BTW, I've been somewhat
suspicious of the Linx VJ compression implementation for some time,
but I can't produce any definite cases where it does the wrong thing.

-- 
Eric Schenk                               www: http://www.dna.lth.se/~erics
Dept. of Comp. Sci., Lund University          email: Eric.Schenk@dna.lth.se
Box 118, S-221 00 LUND, Sweden   fax: +46-46 13 10 21  ph: +46-46 222 96 38

Next message: Eric.Schenk@dna.lth.se: "Re: TCP Stall"
Previous message: Mark Hemment: "Re: ext2 filesystem corruption?!?!??"
Maybe in reply to: Richard B. Johnson: "TCP Stall"
Next in thread: Eric.Schenk@dna.lth.se: "Re: TCP Stall"