Re: [PATCH 2/2] tg3: Fix the TX ring stall

From: Michael Chan
Date: Wed Nov 01 2023 - 17:10:08 EST


On Wed, Nov 1, 2023 at 12:19 PM <alexey.pakhunov@xxxxxxxxxx> wrote:
>
> From: Alex Pakhunov <alexey.pakhunov@xxxxxxxxxx>
>
> The TX ring maintained by the tg3 driver can end up in a state, when it
> has packets queued for sending but the NIC hardware is not informed, so no
> progress is made. This leads to a multi-second interruption in network
> traffic followed by dev_watchdog() firing and resetting the queue.
>
> The specific sequence of steps is:
>
> 1. tg3_start_xmit() is called at least once and queues packet(s) without
> updating tnapi->prodmbox (netdev_xmit_more() returns true)
> 2. tg3_start_xmit() is called with an SKB which causes tg3_tso_bug() to be
> called.
> 3. tg3_tso_bug() determines that the SKB is too large, ...
>
> if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {
>
> ... stops the queue, and returns NETDEV_TX_BUSY:
>
> netif_tx_stop_queue(txq);
> ...
> if (tg3_tx_avail(tnapi) <= frag_cnt_est)
> return NETDEV_TX_BUSY;
>
> 4. Since all tg3_tso_bug() call sites directly return, the code updating
> tnapi->prodmbox is skipped.

Thanks for the patch. An alternative fix that may be simpler is to
add a goto after calling tg3_tso_bug(). Something like this:

tg3_tso_bug();
goto update_tx_mbox;
...

update_tx_mbox:
if (!netdev_xmit_more() || netif_xmit_stopped())
tw32_tx_mbox();
...

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature