Re: TCP Stalls.

Mark Gray (markgray@iago.nac.net)
03 Sep 1998 05:42:02 -0400


"Richard B. Johnson" <root@chaos.analogic.com> writes:

>
> On 2 Sep 1998, Mark Gray wrote:
>> "Richard B. Johnson" <root@chaos.analogic.com> writes:
>>> On Wed, 2 Sep 1998, Alan Cox wrote:
>>>
>>
>> Try using a smaller window for ppp (works for me.)

[snip]

> Yes. Thanks. I was not looking for a work-around. I would like to
> see some effort made to find this long-standing problem. When I
> connect with my Sun it is slow. However nothing stalls even when
> using a Linux machine at home. If I connect linux-to-linux I have
> a problem -- always. If I connect with an ISP, I get what I deserve ;)
>
> Cheers,
> Dick Johnson

For what it might be worth, I just tried ftp'ing my "fail every time"
test file from an ISP which used to have trouble with the default
Window and there is no longer any problem. They are a beta test site
for Livingstone/Lucent so they get a fair turnover in new firmware so
it could have been fixed on their end. I had every intention of
fixing it right for Linux when I started researching the problem, but
in the middle of reading up on it I noticed that OpenBSD used a window
of 16384, tried it, and the problem went away, so I never dug very far
into the code to figure out what was going on. I posted my "kludge"
on c.o.l.n in response to a thread about ppp stalls several months ago
(along with a crackpot theory why it worked :-) and got 4 or 5 reports
that it fixed the problem. My problem with investigating it is that I
only have one modem which has allowed me to view only my side of the
tcpdump, and now that my ISP has fixed their end I don't even have
that. From memory it used to look like a packet got dropped, my ISP
would send data further along, Linux would ACK the last good sequence
number it had received, but my ISP would act as if it never received
that ACK and send data further along at a slower and slower rate until
the link timed out.

Last night it occurred to me that the sequence numbers could have
gotten screwed up inside the Van Jacobsen style TCP/IP header
compression (or elsewhere) and changing the window size has some
mysterious interaction with the compression code (i.e. the other end
thinks it is sending the right data, but the sequence numbers actually
coming out of the VJ compress/decompress cycle no longer match up).
(Or it could be that the larger window used enough memory in their
hardware to cause problems when highly compressed data arrived.)

If one could see synchronized tcpdumps of both sides of the link one
would be in a better position to see what is going wrong.

Hope this helps.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html