Re: FOLLOWUP: Small problem with TCP socket opening

Ricky Beam (root@defiant.interpath.net)
Tue, 26 Aug 1997 14:34:21 -0400 (EDT)


Letting the chips fall where they may, I quote Matthias Urlichs:
>I don't understand why should the connection be reset here in the first
>place.

I'm not sure either (and I don't want to look through RFC1123, it too bloated.
[rightly so, IMHO.]) But that's what it is doing:

Aug 26 03:36:17
..............................[jiffies]
tcp_ipv4.c: tcp_v4_connect(): [000263EC] 199.72.252.1:1054 -> 199.45.111.6:2056 tp->rto: 300
tcp_timer.c: tcp_reset_xmit_timer: [000263EC] 199.72.252.1:1054 -> 199.45.111.6:2056 tm: 300
tcp_timer.c: tcp_retransmit_timer: [00026518] 199.72.252.1:1054 -> 199.45.111.6:2056 Syn Tm: 450
tcp_timer.c: tcp_reset_xmit_timer: [00026518] 199.72.252.1:1054 -> 199.45.111.6:2056 tm: 450
tcp_timer.c: tcp_retransmit_timer: [000266DA] 199.72.252.1:1054 -> 199.45.111.6:2056 Syn Tm: 675
tcp_timer.c: tcp_reset_xmit_timer: [000266DA] 199.72.252.1:1054 -> 199.45.111.6:2056 tm: 675
tcp_timer.c: tcp_retransmit_timer: [0002697D] 199.72.252.1:1054 -> 199.45.111.6:2056 Syn Tm: 750
tcp_timer.c: tcp_reset_xmit_timer: [0002697D] 199.72.252.1:1054 -> 199.45.111.6:2056 tm: 750
tcp_timer.c: tcp_retransmit_timer: [00026C6B] 199.72.252.1:1054 -> 199.45.111.6:2056 Syn Tm: 750
tcp_timer.c: tcp_reset_xmit_timer: [00026C6B] 199.72.252.1:1054 -> 199.45.111.6:2056 tm: 750
tcp_timer.c: tcp_retransmit_timer: [00026F59] 199.72.252.1:1054 -> 199.45.111.6:2056 Syn Tm: 750
tcp_timer.c: tcp_reset_xmit_timer: [00026F59] 199.72.252.1:1054 -> 199.45.111.6:2056 tm: 750
tcp_timer.c: tcp_clear_xmit_timer: [00026F67] 199.72.252.1:1054 -> 199.45.111.6:2056
tcp_input.c: tcp_rcv_state_process(): [00026F67] 199.72.252.1:1054 -> 199.45.111.6:2056 Syn: Ack-Syn [reset]

Aug 26 12:34:16
..............................[jiffies]
tcp_ipv4.c: tcp_v4_connect(): [0033A4C0] 199.72.252.1:2608 -> 205.149.163.212:2056 tp->rto: 300
tcp_timer.c: tcp_reset_xmit_timer: [0033A4C0] 199.72.252.1:2608 -> 205.149.163.212:2056 tm: 300
tcp_timer.c: tcp_retransmit_timer: [0033A5EC] 199.72.252.1:2608 -> 205.149.163.212:2056 Syn Tm: 450
tcp_timer.c: tcp_reset_xmit_timer: [0033A5EC] 199.72.252.1:2608 -> 205.149.163.212:2056 tm: 450
tcp_timer.c: tcp_clear_xmit_timer: [0033A5F7] 199.72.252.1:2608 -> 205.149.163.212:2056
tcp_input.c: tcp_rcv_state_process(): [0033A5F7] 199.72.252.1:2608 -> 205.149.163.212:2056 Syn: Ack-Syn [reset]

Legend:
tcp_reset_xmit_timer: tm ==> 'when'
tcp_retransmit_timer: Syn Tm ==> Indicates SYN TIMEOUT, the number is the
backed off jiffies to the next timeout.
tcp_rcv_state_process: Syn: (indicates sk->state == TCP_SYN_SENT)
the reset breaks out the conditions in that case and what is about
to happen [...]

It looks like sk->zapped is causing the kernel to destroy the retransmit
timer and hang the connection. I have added further debugs to show this.
If you want to see the entire kernel trace from boot time (~ 1100 lines)
then I can make that available (not via the list however.)

>We sent a SYN, get an ACK back. The connection is now half-open. We're
>still waiting for their SYN so that we can send our ACK back to them. After
>that, the connection would be fully open.

It would appear, Linux won't half-open a socket.

>The usual practice of sending SYN+ACK replies when we're passively opened
>is only an optimization which can be skipped, eg. if both sides
>simultaneously open a connection to the other (yes, TCP explicitly allows
>this, read the RFC).

I didn't write the TCP stack, I just point out why it isn't working right.

I'm just wondering why dominion is the only machine to be experiencing this
phenomenon???

All the following machine are running the identical application:

Dominion: (connect stalls)
Kernel== 2.1.51-SMP (+ kernel tcp trace code)
Dual P200
128M 60ns EDO
Tyan Tomacat III
Adaptec 2940UW + Atlas I 4.3G system disk
Intel EEPro 100/B (eth0)
Intel EEPro 10+/ISA (previous eth0, only reduced the checksum errors)
Connets via ethernet to Netoia PN655/v3.1 -> USR TC (3.5.??) -> Core network

Auth3: (no stall)
Kernel== 2.1.51-SMP (same kernel as dominion - kernel tcp trace code)
Dual PPro200
128M 50ns SDRAM
PR440 MB
Adaptec 2940UW (on board) + Quantum 1G system disk
Intel EEPro 100/B (on board)
Connects via ethernet to Core network

Lacota: (no stall)
Kernel== 2.0.27
P120
32M 60ns FPM (parity)
Intel Endeavor MB
Adaptec 2940U + HP [not so]Sure Store 2Gig system disk
3com 3c509
Connects via ethernet to Core network

Hoppy/Toad: (no stall)
Kernel== 2.0.27 (lacota source + sound driver buffer modifications)
PPro200
64M 60ns FPM (parity)
??? MB
NCR 53c825 + HP [not so]Sure Store 2Gig system disk
3com 3c595 (@ 10mbs)
Connects via ethernet to Core [web] network

--Ricky