Re: PROBLEM: Wireless networking goes down on Acer C720P Chromebook (bisected)

From: Johannes Berg
Date: Thu Jan 02 2020 - 08:28:52 EST


On Tue, 2019-12-31 at 19:49 -0500, Stephen Oberholtzer wrote:
> Wireless networking goes down on Acer C720P Chromebook (bisected)
>
> Culprit: 7a89233a ("mac80211: Use Airtime-based Queue Limits (AQL) on
> packet dequeue")
>
> I found that the newest kernel (5.4) displayed a curious issue on my
> Acer C720P Chromebook: shortly after bringing networking up, all
> connections would suddenly fail. I discovered that I could
> consistently reproduce the issue by ssh'ing into the machine and
> running 'dmesg' -- on a non-working kernel; I would get partial
> output, and then the connection would completely hang. This was so
> consistent, in fact, that I was able to leverage it to automate the
> process from 'git bisect run'.
>
> KEYWORDS: c720p, chromebook, wireless, networking, mac80211
>
> KERNEL: any kernel containing commit 7a89233a ("mac80211: Use
> Airtime-based Queue Limits (AQL) on packet dequeue")
>
> I find this bit in the offending commit's message suspicious:
>
> > This patch does *not* include any mechanism to wake a throttled TXQ again,
> > on the assumption that this will happen anyway as a side effect of whatever
> > freed the skb (most commonly a TX completion).
>
> Methinks this assumption is not a fully valid one.

I think I found at least one hole in this, but IIRC (it was before my
vacation, sorry) it was pretty unlikely to actually happen. Perhaps
there are more though.

https://lore.kernel.org/r/b14519e81b6d2335bd0cb7dcf074f0d1a4eec707.camel@xxxxxxxxxxxxxxxx


> I'll be happy to test any patches. If you need some printk calls, just
> tell me where to put 'em.

Do you get any output at all? Like a WARN_ON() for an underflow, or
something?

johannes