Re: [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS

From: Johannes Berg
Date: Thu May 04 2023 - 08:11:11 EST


[let's see if my reply will make it to the list, the original seems to
not have]

On Sun, 2023-04-30 at 00:13 +0000, Sultan Alsawaf wrote:
> From: Sultan Alsawaf <sultan@xxxxxxxxxxxxxxx>
>
> When RSS is used and one of the RX queues lags behind others by more than
> 2048 frames, then new frames arriving on the lagged RX queue are
> incorrectly treated as old rather than new by the reorder buffer, and are
> thus spuriously dropped. This is because the reorder buffer treats frames
> as old when they have an SN that is more than 2048 away from the head SN,
> which causes the reorder buffer to drop frames that are actually valid.
>
> The odds of this occurring naturally increase with the number of
> RX queues used, so CPUs with many threads are more susceptible to
> encountering spurious packet drops caused by this issue.
>
> As it turns out, the firmware already detects when a frame is either old or
> duplicated and exports this information, but it's currently unused. Using
> these firmware bits to decide when frames are old or duplicated fixes the
> spurious drops.

So I assume you tested it now, and it works? Somehow I had been under
the impression we never got it to work back when...

> Johannes mentions that the 9000 series' firmware doesn't support these
> bits, so disable RSS on the 9000 series chipsets since they lack a
> mechanism to properly detect old and duplicated frames.

Indeed, I checked this again, I also somehow thought it was backported
to some versions but doesn't look like. We can either leave those old
ones broken (they only shipped with fewer cores anyway), or just disable
it as you did here, not sure. RSS is probably not as relevant with those
slower speeds anyway.

> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
> @@ -918,7 +918,6 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm,
> struct iwl_mvm_sta *mvm_sta;
> struct iwl_mvm_baid_data *baid_data;
> struct iwl_mvm_reorder_buffer *buffer;
> - struct sk_buff *tail;
> u32 reorder = le32_to_cpu(desc->reorder_data);
> bool amsdu = desc->mac_flags2 & IWL_RX_MPDU_MFLG2_AMSDU;
> bool last_subframe =
> @@ -1020,7 +1019,7 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm,
> rx_status->device_timestamp, queue);
>
> /* drop any oudated packets */
> - if (ieee80211_sn_less(sn, buffer->head_sn))
> + if (reorder & IWL_RX_MPDU_REORDER_BA_OLD_SN)
> goto drop;
>
> /* release immediately if allowed by nssn and no stored frames */
> @@ -1068,24 +1067,12 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm,
> return false;
> }

All that "send queue sync" code in the middle that was _meant_ to fix
this issue but I guess never really did can also be removed, no? And the
timer, etc. etc.

johannes

[leaving full quote for the benefit of the mailing list]

>
> - index = sn % buffer->buf_size;
> -
> - /*
> - * Check if we already stored this frame
> - * As AMSDU is either received or not as whole, logic is simple:
> - * If we have frames in that position in the buffer and the last frame
> - * originated from AMSDU had a different SN then it is a retransmission.
> - * If it is the same SN then if the subframe index is incrementing it
> - * is the same AMSDU - otherwise it is a retransmission.
> - */
> - tail = skb_peek_tail(&entries[index].e.frames);
> - if (tail && !amsdu)
> - goto drop;
> - else if (tail && (sn != buffer->last_amsdu ||
> - buffer->last_sub_index >= sub_frame_idx))
> + /* drop any duplicated packets */
> + if (desc->status & cpu_to_le32(IWL_RX_MPDU_STATUS_DUPLICATE))
> goto drop;
>
> /* put in reorder buffer */
> + index = sn % buffer->buf_size;
> __skb_queue_tail(&entries[index].e.frames, skb);
> buffer->num_stored++;
> entries[index].e.reorder_time = jiffies;
> --
> 2.40.1
>