Re: [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS

From: Johannes Berg
Date: Fri May 05 2023 - 02:40:39 EST


On Thu, 2023-05-04 at 10:55 -0700, Sultan Alsawaf wrote:
> >
> > So I assume you tested it now, and it works? Somehow I had been under
> > the impression we never got it to work back when...
>
> Yep, I've been using this for about a year and have let it run through the
> original iperf3 reproducer I mentioned on bugzilla for hours with no stalls. My
> big git clones don't freeze anymore either. :)

Oh! OK, great.

> What I wasn't able to get working was the big reorder buffer cleanup that's made
> possible by using these firmware bits. The explicit queue sync can be removed
> easily, but there were further potential cleanups you had mentioned that I
> wasn't able to get working.

Fair enough.

> I hadn't submitted this patch until now because I was hoping to get the big
> cleanup done simultaneously but I got too busy until now. Since this small patch
> does fix the issue, my thought is that this could be merged and sent to stable,
> and with subsequent patches I can chip away at cleaning up the reorder buffer.

Sure, that makes sense.

> > > Johannes mentions that the 9000 series' firmware doesn't support these
> > > bits, so disable RSS on the 9000 series chipsets since they lack a
> > > mechanism to properly detect old and duplicated frames.
> >
> > Indeed, I checked this again, I also somehow thought it was backported
> > to some versions but doesn't look like. We can either leave those old
> > ones broken (they only shipped with fewer cores anyway), or just disable
> > it as you did here, not sure. RSS is probably not as relevant with those
> > slower speeds anyway.
>
> Agreed, I think it's worth disabling RSS on 9000 series to fix it there. If the
> RX queues are heavily backed up and incoming packets are not released fast
> enough due to a slow CPU, then I think the spurious drops could happen somewhat
> regularly on slow devices using 9000 series.
>
> It's probably also difficult to judge the impact/frequency of these spurious
> drops in the wild due to TCP retries potentially masking them. The issue can be
> very noticeable when a lot of packets are spuriously dropped at once though, so
> I think it's certainly worth the tradeoff to disable RSS on the older chipsets.

:)

> Indeed, and removing the queue sync + timer are easy. Would you prefer I send
> additional patches for at least those cleanups before the fix itself can be
> considered for merging?
>

No, you know, maybe this is easier since it's the smallest possible
change that fixes issues. Just have to see what Emmanuel says, he had
said he sees issues with this change.

johannes