[PATCH net] tcp: fix keepalive when data remain undelivered

From: Enke Chen
Date: Fri Feb 19 2021 - 19:57:46 EST


From: Enke Chen <enchen@xxxxxxxxxxxxxxxxxxxx>

TCP keepalive does not timeout under the condition that network connection
is lost and data remain undelivered (incl. retransmit). A very simple
scenarios of the failure is to write data to a tcp socket after the network
connection is lost.

Under the specified condition the keepalive timeout is not evaluated in
the keepalive timer. That is the primary cause of the failure. In addition,
the keepalive probe is not sent out in the keepalive timer. Although packet
retransmit or 0-window probe can serve a similar purpose, they have their
own timers and backoffs that are generally not aligned with the keepalive
parameters for probes and timeout.

As the timing and conditions of the events involved are random, the tcp
keepalive can fail randomly. Given the randomness of the failures, fixing
the issue would not cause any backward compatibility issues. As was well
said, "Determinism is a special case of randomness".

The fix in this patch consists of the following:

a. Always evaluate the keepalive timeout in the keepalive timer.

b. Always send out the keepalive probe in the keepalive timer (post the
keepalive idle time). Given that the keepalive intervals are usually
in the range of 30 - 60 seconds, there is no need for an optimization
to further reduce the number of keepalive probes in the presence of
packet retransmit.

c. Use the elapsed time (instead of the 0-window probe counter) in
evaluating tcp keepalive timeout.

Thanks to Eric Dumazet, Neal Cardwell, and Yuchung Cheng for helpful
discussions about the issue and options for fixing it.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2 Initial git repository build")
Signed-off-by: Enke Chen <enchen@xxxxxxxxxxxxxxxxxxxx>
---
net/ipv4/tcp_timer.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 4ef08079ccfa..16a044da20db 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t)
((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))
goto out;

- elapsed = keepalive_time_when(tp);
-
- /* It is alive without keepalive 8) */
- if (tp->packets_out || !tcp_write_queue_empty(sk))
- goto resched;
-
elapsed = keepalive_time_elapsed(tp);

if (elapsed >= keepalive_time_when(tp)) {
/* If the TCP_USER_TIMEOUT option is enabled, use that
* to determine when to timeout instead.
*/
- if ((icsk->icsk_user_timeout != 0 &&
- elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&
- icsk->icsk_probes_out > 0) ||
- (icsk->icsk_user_timeout == 0 &&
- icsk->icsk_probes_out >= keepalive_probes(tp))) {
+ u32 timeout = icsk->icsk_user_timeout ?
+ msecs_to_jiffies(icsk->icsk_user_timeout) :
+ keepalive_intvl_when(tp) * keepalive_probes(tp) +
+ keepalive_time_when(tp);
+
+ if (elapsed >= timeout) {
tcp_send_active_reset(sk, GFP_ATOMIC);
tcp_write_err(sk);
goto out;
}
if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
- icsk->icsk_probes_out++;
elapsed = keepalive_intvl_when(tp);
} else {
/* If keepalive was lost due to local congestion,
@@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t)
}

sk_mem_reclaim(sk);
-
-resched:
inet_csk_reset_keepalive_timer (sk, elapsed);
goto out;

--
2.29.2