TCP problem in 2.2.1

Greg Beeley (gbeeley@ix.netcom.com)
Fri, 26 Feb 1999 15:23:12 +0000


--------
(message originally posted to comp.os.linux.development.system)

Please CC: me personally on any replies/follow-ups as I am not subscribed
to this list.
--------

Hi,

I've come across a rather strange problem in the TCP stack in 2.2.1. I
glanced over the patches to 2.2.2 and none seemed to affect this part, most
were tuning things or SYN updates.

I have a TCP network connection between a Linux box and a Lantronix LRS16
server with a printer attached to a port. The LRS16 accepts connections
on a given TCP port to write data to the serial port. In this case, the LRS16's
port is 3001. The linux process makes a network connection, writes a bunch
of data, and does a standard lingering close.

Near the end of the connection's lifespan, the thing seems to lock up, with
netstat reporting state=FIN_WAIT_1 and send_q = some small number of bytes
(like 200-900 bytes). I did a tcpdump, and this is what came up:

11:25:48.523709 lrs16.3001 > adam.1286: . ack 33993 win 1406

Ok, the LRS16 has offered a window of 1406 bytes. Now, adam (our linux server)
will take advantage of part of that window.

11:25:48.523750 adam.1286 > lrs16.3001: P 33993:35297(1304) ack 1 win 32120 (DF)
11:25:48.573205 lrs16.3001 > adam.1286: . ack 35297 win 102

Good. the LRS16 acknowledges the bytes and offers the remaining window
of 102 bytes. So far, so good.

11:25:48.763800 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)
11:25:48.813216 lrs16.3001 > adam.1286: . ack 35399 win 0

Ok, the linux server ate up the remaining 102 bytes. The LRS16 acknowledges
and tells the linux box to stop via a window of 0 bytes.

11:25:48.963788 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)
11:25:48.964536 lrs16.3001 > adam.1286: . ack 35399 win 0

Hmmm... wow. The linux server didn't get the point. It seems to think that
the ACK was never received. But, the LRS16 happily acknowledged the seemingly
duplicate packet.

11:25:49.363786 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)
11:25:49.364544 lrs16.3001 > adam.1286: . ack 35399 win 0
11:25:50.163787 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)
11:25:50.164558 lrs16.3001 > adam.1286: . ack 35399 win 0
11:25:51.763791 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)
11:25:51.764560 lrs16.3001 > adam.1286: . ack 35399 win 0
11:25:54.763929 lrs16.3001 > adam.1286: . ack 35399 win 1406
11:25:54.963788 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)

Alright. Some time has passed and the buffers to the printer have cleared
out somewhat in the LRS16. The linux box, however, is still insistent on
transmitting those already-acknowledged 102 bytes.

11:25:54.964590 lrs16.3001 > adam.1286: . ack 35399 win 1406
11:25:56.493589 lrs16.3001 > adam.1286: . ack 35399 win 2048
11:26:01.363790 adam.1286 > lrs16.3001: . 35297:35399(102) ack 1 win 32120 (DF)

And after more time the LRS's window gets even larger. But, the linux
box is _still_ insistent on those 102 bytes, even though it still has quite
a few (nine-hundred-some) still in the send-q. The connection continued to
go on like this, with an occasional retransmit from the linux box of the 102
bytes and an acknowledge from the lrs16 offering the 2048 byte window.

Any ideas? I can send a full transcript of the tcpdump session if anyone
wants to take this one on.

Thanks.

--------
Greg.Beeley@LightSys.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/