RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

From: Souradeep Chakrabarti
Date: Mon Jul 03 2023 - 15:55:19 EST




>-----Original Message-----
>From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
>Sent: Monday, July 3, 2023 10:18 PM
>To: souradeep chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
><haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
><decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
>kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; Long Li <longli@xxxxxxxxxxxxx>; Ajay
>Sharma <sharmaajay@xxxxxxxxxxxxx>; leon@xxxxxxxxxx;
>cai.huoqing@xxxxxxxxx; ssengar@xxxxxxxxxxxxxxxxxxx; vkuznets@xxxxxxxxxx;
>tglx@xxxxxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
>linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
>stable@xxxxxxxxxxxxxxx; Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxx>
>Subject: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when
>host is unresponsive
>
>From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>Date: Mon, 3 Jul 2023 01:49:31 -0700
>
>> From: Souradeep Chakrabarti <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>
>Please sync your Git name and Git mail account settings, so that your own
>patches won't have "From:" when sending. From what I see, you need to
>correct first letters of name and surname to capital in the Git email settings
>block.
Thank you for pointing, I will fix it.
>
>>
>> When unloading the MANA driver, mana_dealloc_queues() waits for the
>> MANA hardware to complete any inflight packets and set the pending
>> send count to zero. But if the hardware has failed,
>> mana_dealloc_queues() could wait forever.
>>
>> Fix this by adding a timeout to the wait. Set the timeout to 120
>> seconds, which is a somewhat arbitrary value that is more than long
>> enough for functional hardware to complete any sends.
>>
>> Signed-off-by: Souradeep Chakrabarti
>> <schakrabarti@xxxxxxxxxxxxxxxxxxx>
>
>Where's "Fixes:" tagging the blamed commit?
This is present from the day zero of the mana driver code.
It has not been introduced in the code by any commit.
>
>> ---
>> V3 -> V4:
>> * Fixed the commit message to describe the context.
>> * Removed the vf_unload_timeout, as it is not required.
>> ---
>> drivers/net/ethernet/microsoft/mana/mana_en.c | 26
>> ++++++++++++++++---
>> 1 file changed, 23 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> index a499e460594b..d26f1da70411 100644
>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>> @@ -2346,7 +2346,10 @@ static int mana_dealloc_queues(struct
>> net_device *ndev) {
>> struct mana_port_context *apc = netdev_priv(ndev);
>> struct gdma_dev *gd = apc->ac->gdma_dev;
>> + unsigned long timeout;
>> struct mana_txq *txq;
>> + struct sk_buff *skb;
>> + struct mana_cq *cq;
>> int i, err;
>>
>> if (apc->port_is_up)
>> @@ -2363,15 +2366,32 @@ static int mana_dealloc_queues(struct
>net_device *ndev)
>> * to false, but it doesn't matter since mana_start_xmit() drops any
>> * new packets due to apc->port_is_up being false.
>> *
>> - * Drain all the in-flight TX packets
>> + * Drain all the in-flight TX packets.
>> + * A timeout of 120 seconds for all the queues is used.
>> + * This will break the while loop when h/w is not responding.
>> + * This value of 120 has been decided here considering max
>> + * number of queues.
>> */
>> +
>> + timeout = jiffies + 120 * HZ;
>
>Why not initialize it right when declaring?
I will fix it in next version.
>
>> for (i = 0; i < apc->num_queues; i++) {
>> txq = &apc->tx_qp[i].txq;
>> -
>> - while (atomic_read(&txq->pending_sends) > 0)
>> + while (atomic_read(&txq->pending_sends) > 0 &&
>> + time_before(jiffies, timeout)) {
>> usleep_range(1000, 2000);> + }
>> }
>
>120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>iterations. I know usleep_range() often is much less precise, but still.
>Do you really need that much time? Has this been measured during the tests
>that it can take up to 120 seconds or is it just some random value that "should
>be enough"?
>If you really need 120 seconds, I'd suggest using a timer / delayed work instead
>of wasting resources.
Here the intent is not waiting for 120 seconds, rather than avoid continue checking the
pending_sends of each tx queues for an indefinite time, before freeing sk_buffs.
The pending_sends can only get decreased for a tx queue, if mana_poll_tx_cq()
gets called for a completion notification and increased by xmit.

In this particular bug, apc->port_is_up is not set to false, causing
xmit to keep increasing the pending_sends for the queue and mana_poll_tx_cq()
not getting called for the queue.

If we see the comment in the function mana_dealloc_queues(), it mentions it :

2346 /* No packet can be transmitted now since apc->port_is_up is false.
2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
2348 * a txq because it may not timely see apc->port_is_up being cleared
2349 * to false, but it doesn't matter since mana_start_xmit() drops any
2350 * new packets due to apc->port_is_up being false.

The value 120 seconds has been decided here based on maximum number of queues
are allowed in this specific hardware, it is a safe assumption.
>
>>
>> + for (i = 0; i < apc->num_queues; i++) {
>> + txq = &apc->tx_qp[i].txq;
>> + cq = &apc->tx_qp[i].tx_cq;
>
>cq can be just &txq->tx_cq.
mana_txq structure does not have a pointer to mana_cq.
>
>> + while (atomic_read(&txq->pending_sends)) {
>> + skb = skb_dequeue(&txq->pending_skbs);
>> + mana_unmap_skb(skb, apc);
>> + napi_consume_skb(skb, cq->budget);
>
>(you already have comment about this one)
>
>> + atomic_sub(1, &txq->pending_sends);
>> + }
>> + }
>> /* We're 100% sure the queues can no longer be woken up, because
>> * we're sure now mana_poll_tx_cq() can't be running.
>> */
>
>Thanks,
>Olek