Re: [PATCH v2 1/2] tg3: Increment tx_dropped in tg3_tso_bug()

From: Michael Chan
Date: Fri Nov 03 2023 - 19:03:07 EST


On Fri, Nov 3, 2023 at 10:07 AM Alex Pakhunov
<alexey.pakhunov@xxxxxxxxxx> wrote:
> I'm not super familiar with the recommended approach for handling locks in
> network drivers, so I spent a bit of tme looking at what tg3 does.
>
> It seems that there are a few ways to remove the race condition when
> working with these counters:
>
> 1. Use atomic increments. It is easy but every update is more expensive
> than it needs to be. We might be able to say that there specific
> counters are updated rarely, so maybe we don't care too much.
> 2. netif_tx_lock is already taken when tx_droped is incremented - wrap
> rx_dropped increment and reading both counters in netif_tx_lock. This
> seems legal since tg3_tx() can take netif_tx_lock. I'm not sure how to
> order netif_tx_lock and tp->lock, since tg3_get_stats64() takes
> the latter. Should netif_tx_lock be takes inside tp->lock? Should they
> be not nested?
> 3. Using tp->lock to protect rx_dropped (tg3_poll_link() already takes it
> so it must be legal) and netif_tx_lock to protect tx_dropped.
>
> There are probably other options. Can you recommend an aproach?

I recommend using per queue counters as briefly mentioned in my
earlier reply. Move the tx_dropped and rx_dropped counters to the per
queue tg3_napi struct. Incrementing tnapi->tx_dropped in
tg3_start_xmit() is serialized by the netif_tx_lock held by the stack.

Similarly, incrementing tnapi->rx_dropped in the tg3_rx() is serialized by NAPI.

tg3_get_stats64() can just loop and sum all the tx_dropped and
rx_dropped counters in each tg3_napi struct. We don't worry about
locks here since we are just reading.

>
> Also, this seems like a larger change that should be done separately from
> fixing the TX stall. Should we land just "[PATCH v2 2/2]"? Should we land
> the whole patch (since it does not make race condition much worse) and fix
> the race condition separately?
>

Yes, we can merge patch #2 first which fixes the stall. Please repost
just patch #2 standalone if you want to do that. Thanks.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature