Re: Problem with TCP (http initiated ftp garble)

Kaz Kylheku (bill@cafe.net)
Wed, 3 Jun 1998 13:51:29 -0700 (PDT)


On Wed, 3 Jun 1998, Barry Treahy wrote:

> The replacement of libc didn't do anything... Here is an example of what I'm
> seeing during the sample FTP of the vmlinuz and vmlinuz.gz files...
>
> ftp> cd /
> 250 CWD command successful.
> ftp> get vmlinuz
> 200 PORT command successful.
> 150 Opening BINARY mode data connection for vmlinuz (406263 bytes).
> #### <-- NOTE: at this point the xfer stalls... This is what the netstat looks
> like:
> mml1:/etc/rc.d# netstat -n
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 0 124 xx.xx.xx.xx:23 yy.yy.yy.yy:4676 ESTABLISHED
> tcp 0 0 xx.xx.xx.xx:21 zz.zz.zz.zz:1710 ESTABLISHED
> tcp 0 49640 xx.xx.xx.xx:20 zz.zz.zz.zz:1712 ESTABLISHED
> and then I did a CTRL-C
> receive aborted

Could it be lost packets (particularly acks) during a slow start? How long
did you actually wait for the apparently stalled transfer to resume?
Suppose that the sender is waiting for an ack that didn't get there.
It therefore can't increase its congestion window size, and cannot
advance since its small window has already been transmitted and
none of it has been acknowledged.

You see, when a TCP connection is initiated, it is required to not
start spamming packets at the network right away even if the receiver
has advertized a large window. It must pretend that the window is
one segment wide. As it receives acknowledgements, it can increase this
fake window size.

Here is some relevant text from Stevens' TCP/IP Illustrated (vol 1, p 286):

`` The sender starts by transmitting one segment and waiting
for its ACK. When that ACK is received, the congestion window
is incremented from one two two, and two segments can be sent.
When each of those two segments is acknowledged, the congestion
window is increased to four. This provides an exponential increase''.

Now suppose the ACK is lost in this early stage. It will then look like
you sent a little bit of data and then the transfer hung.
How do you get out of such a hung state? The receiver won't send you
duplicate ACKs since you aren't sending anything. Thus the flow will
start only if the sender takes some action; this action is triggered
by the ``persist timer''.

It really smells like you are having a network device problem---hardware
dropping packets on you. [[ Also, drivers can drop packets without TCP
knowing about it: on receipt, a network device will toss a packet if,
for instance, it can't atomically get the memory it needs to store
the packet. On sending, the network device subsystem will toss packets
if the transmit queue of a device is filled up. For an ethernet device,
the queue is 100 packets long. ]]

It would be instructive to see a tcpdump of the ethernet devices
from *both* machines when this happens. Comparing such dumps would
show whether segments went missing somewhere between the two interfaces.

Also, the next time this problem occurs, don't kill the FTP. Go have
a coffee or something. See if the transfer will restart itself.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu