Re: [PATCH v3] net: macb: restart tx after tx used bit read

From: Tomas Melin
Date: Wed Mar 23 2022 - 04:09:09 EST


Hi,

> From: <Claudiu.Beznea@xxxxxxxxxxxxx>
> To: <Nicolas.Ferre@xxxxxxxxxxxxx>, <davem@xxxxxxxxxxxxx>
> Cc: <netdev@xxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>,
> <Claudiu.Beznea@xxxxxxxxxxxxx>
> Subject: [PATCH v3] net: macb: restart tx after tx used bit read
> Date: Mon, 17 Dec 2018 10:02:42 +0000 [thread overview]
> Message-ID: <1545040937-6583-1-git-send-email-claudiu.beznea@xxxxxxxxxxxxx> (raw)
>
> From: Claudiu Beznea <claudiu.beznea@xxxxxxxxxxxxx>
>
> On some platforms (currently detected only on SAMA5D4) TX might stuck
> even the pachets are still present in DMA memories and TX start was
> issued for them. This happens due to race condition between MACB driver
> updating next TX buffer descriptor to be used and IP reading the same
> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> is asserted TX must be restarted. Restart TX if used bit is read and
> packets are present in software TX queue. Packets are removed from software
> TX queue if TX was successful for them (see macb_tx_interrupt()).
>
> Signed-off-by: Claudiu Beznea <claudiu.beznea@xxxxxxxxxxxxx>

On Xilinx Zynq the above change can cause infinite interrupt loop leading
to CPU stall. Seems timing/load needs to be appropriate for this to happen, and currently
with 1G ethernet this can be triggered normally within minutes when running stress tests
on the network interface.

The events leading up to the interrupt looping are similar as the issue described in the
commit message. However in our case, restarting TX does not help at all. Instead
the controller is stuck on the queue end descriptor generating endless TX_USED
interrupts, never breaking out of interrupt routine.

Any chance you remember more details about in which situation restarting TX helped for
your use case? was tx_qbar at the end of frame or stopped in middle of frame?

thanks,
Tomas Melin


> ---
>
> Changes in v3:
> - remove "inline" keyword
>
> Changes in v2:
> - use "static inline" instead of "inline static" for macb_tx_restart()
>
> drivers/net/ethernet/cadence/macb_main.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 1d86b4d5645a..f920230386ee 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -61,7 +61,8 @@
> #define MACB_TX_ERR_FLAGS (MACB_BIT(ISR_TUND) \
> | MACB_BIT(ISR_RLE) \
> | MACB_BIT(TXERR))
> -#define MACB_TX_INT_FLAGS (MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP))
> +#define MACB_TX_INT_FLAGS (MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP) \
> + | MACB_BIT(TXUBR))
>
> /* Max length of transmit frame must be a multiple of 8 bytes */
> #define MACB_TX_LEN_ALIGN 8
> @@ -1312,6 +1313,21 @@ static void macb_hresp_error_task(unsigned long data)
> netif_tx_start_all_queues(dev);
> }
>
> +static void macb_tx_restart(struct macb_queue *queue)
> +{
> + unsigned int head = queue->tx_head;
> + unsigned int tail = queue->tx_tail;
> + struct macb *bp = queue->bp;
> +
> + if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> + queue_writel(queue, ISR, MACB_BIT(TXUBR));
> +
> + if (head == tail)
> + return;
> +
> + macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +}
> +
> static irqreturn_t macb_interrupt(int irq, void *dev_id)
> {
> struct macb_queue *queue = dev_id;
> @@ -1369,6 +1385,9 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
> if (status & MACB_BIT(TCOMP))
> macb_tx_interrupt(queue);
>
> + if (status & MACB_BIT(TXUBR))
> + macb_tx_restart(queue);
> +
> /* Link change detection isn't possible with RMII, so we'll
> * add that if/when we get our hands on a full-blown MII PHY.
> */
> --
> 2.7.4
>