Re: [PATCH net-next 3/3] net: tcp: check timeout by icsk->icsk_timeout in tcp_retransmit_timer()

From: Menglong Dong
Date: Mon Jul 31 2023 - 04:24:55 EST


On Fri, Jul 28, 2023 at 10:25 PM Neal Cardwell <ncardwell@xxxxxxxxxx> wrote:
>
> On Fri, Jul 28, 2023 at 1:50 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
[...]
>
> In that packetdrill case AFAICT that is the ZWP timer firing, and the
> sender sends a ZWP.
>
> I think maybe Menglong is looking more at something like the following
> scenario, where at the time the RTO timer fires the data sender finds
> the tp->snd_wnd is zero, so it sends a retransmit of the
> lowest-sequence data packet.
>
> Here is a packetdrill case and the tcpdump trace on an upstream
> net-next kernel... I have not worked out all the details at the end,
> but perhaps it can help move the discussion forward:
>
>
> ~/packetdrill/gtests/net/tcp/receiver_window# cat rwin-rto-zero-window.pkt
> // Test how sender reacts to unexpected arrival rwin of 0.
>
> `../common/defaults.sh`
>
> // Create a socket.
> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> // Establish a connection.
> +.1 < S 0:0(0) win 65535 <mss 1000,nop,nop,sackOK,nop,wscale 6>
> +0 > S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 14>
> +.1 < . 1:1(0) ack 1 win 457
> +0 accept(3, ..., ...) = 4
>
> +0 write(4, ..., 20000) = 20000
> +0 > P. 1:10001(10000) ack 1
>
> // TLP
> +.2 > . 10001:11001(1000) ack 1
> // Receiver has retracted rwin to 0
> // (perhaps from the 2023 proposed OOM code?).
> +.1 < . 1:1(0) ack 1 win 0
>
> // RTO, and in tcp_retransmit_timer() we see the receiver window is zero,
> // so we take the special f (!tp->snd_wnd...) code path.
> +.2 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +.5 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +1.2 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +2.4 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +4.8 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +9.6 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +19.2 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +38.4 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +76.8 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +120 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1 win 0
>
> +120 > . 1:1001(1000) ack 1
> +.1 < . 1:1(0) ack 1001 win 1000
>
> // Received non-zero window update. Send more data.
> +0 > P. 1001:3001(2000) ack 1
> +.1 < . 1:1(0) ack 3001 win 1000
>
> ----------
> When I run that script on a net-next kernel I see the rounding up of
> the RTO to 122 secs rather than 120 secs, but for whatever reason the
> script does not cause the socket to die early...
>

I think I know the reason now. Without the 2nd patches that I send
in this series, the ACK can't update the rwin to 0, as it will be ignored
in tcp_may_update_window().

However, you can send an ACK that acknowledges the new data
to update the rwin to 0. I modified your script, and it can die as we
excepted:

// Test how sender reacts to unexpected arrival rwin of 0.

`../common/defaults.sh`

// Create a socket.
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+.1 < S 0:0(0) win 65535 <mss 1000,nop,nop,sackOK,nop,wscale 6>
+0 > S. 0:0(0) ack 1 win 65535 <mss 1440,nop,nop,sackOK,nop,wscale 8>
+.1 < . 1:1(0) ack 1 win 457
+0 accept(3, ..., ...) = 4

+0 write(4, ..., 20000) = 20000
+0 > P. 1:10001(10000) ack 1

// Update the window to 0. "ack 0 win 0" won't update the window, as it
// will be ignored by tcp_may_update_window()
+.1 < . 1:1(0) ack 1001 win 0

// RTO, and in tcp_retransmit_timer() we see the receiver window is zero,
// so we take the special f (!tp->snd_wnd...) code path.
+.2 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+.5 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+1.2 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+2.4 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+4.8 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+9.6 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+19.2 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+38.4 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

+76.8 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0

// socket will die in tcp_retransmit_timer() in the
// "tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX" code path.
// Following retransmit won't happen.
+120 > . 1001:2001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 0
------------------------------------------------------------------------------

I don't know how to check the die of socket with
packetdrill, so I checked it by:
ss -nitme | grep 8080 | grep on
And I can see the socket die after timeout of the 120seconds
timer.

$ packetdrill ./rwin-rto-zero-window.pkt
./rwin-rto-zero-window.pkt:55: error handling packet: Timed out
waiting for packet


> The tcpdump trace:
>
> tcpdump -ttt -n -i any port 8080 &
>
> ->
>
> ~/packetdrill/gtests/net/tcp/receiver_window#
> ../../packetdrill/packetdrill rwin-rto-zero-window.pkt
> 00:01:01.370344 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [S], seq 0, win 65535, options [mss
> 1000,nop,nop,sackOK,nop,wscale 6], length 0
> 00:00:00.000096 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [S.], seq 3847169154, ack 1, win 65535, options [mss
> 1460,nop,nop,sackOK,nop,wscale 14], length 0
> 00:00:00.100277 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 457, length 0
> 00:00:00.000090 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [P.], seq 1:2001, ack 1, win 4, length 2000: HTTP
> 00:00:00.000006 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [P.], seq 2001:4001, ack 1, win 4, length 2000: HTTP
> 00:00:00.000003 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [P.], seq 4001:6001, ack 1, win 4, length 2000: HTTP
> 00:00:00.000002 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [P.], seq 6001:8001, ack 1, win 4, length 2000: HTTP
> 00:00:00.000001 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [P.], seq 8001:10001, ack 1, win 4, length 2000: HTTP
> 00:00:00.209131 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 10001:11001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100190 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:00.203824 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100175 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:00.507835 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100192 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:01.115858 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100182 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:02.331747 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100198 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:04.955980 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100197 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:09.627985 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100179 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:19.355725 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100203 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:00:42.395633 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100202 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:01:17.724059 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100201 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:02:02.779516 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100229 tun0 In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1, win 0, length 0
> 00:02:02.779828 tun0 Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
> 00:00:00.100230 ? In IP 192.0.2.1.51231 > 192.168.56.132.8080:
> Flags [.], ack 1001, win 1000, length 0
> 00:00:00.000034 ? Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 11001:12001, ack 1, win 4, length 1000: HTTP
> 00:00:00.000005 ? Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
> Flags [.], seq 12001:13001, ack 1, win 4, length 1000: HTTP
>
> rwin-rto-zero-window.pkt:62: error handling packet: live packet field
> tcp_psh: expected: 1 (0x1) vs actual: 0 (0x0)
> script packet: 405.390244 P. 1001:3001(2000) ack 1
> actual packet: 405.390237 . 11001:13001(2000) ack 1 win 4