Re: [PATCH net] rxrpc: Fix lockup due to no error backoff after ack transmit error

From: David Miller
Date: Sat Nov 03 2018 - 03:00:22 EST


From: David Howells <dhowells@xxxxxxxxxx>
Date: Thu, 01 Nov 2018 13:39:53 +0000

> If the network becomes (partially) unavailable, say by disabling IPv6, the
> background ACK transmission routine can get itself into a tizzy by
> proposing immediate ACK retransmission. Since we're in the call event
> processor, that happens immediately without returning to the workqueue
> manager.
>
> The condition should clear after a while when either the network comes back
> or the call times out.
>
> Fix this by:
>
> (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
> This will allow a certain amount of time to elapse before we try
> again.
>
> (2) Enforce a return to the workqueue manager after a certain number of
> iterations of the call processing loop.
>
> (3) Add a backoff delay that increases the delay on deferred ACKs by a
> jiffy per failed transmission to a limit of HZ. The backoff delay is
> cleared on a successful return from kernel_sendmsg().
>
> (4) Cancel calls immediately if the opening sendmsg fails. The layer
> above can arrange retransmission or rotate to another server.
>
> Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
> Signed-off-by: David Howells <dhowells@xxxxxxxxxx>

Applied and queued up for -stable.