Re: [BUG] brcmfmac locks up on resume from suspend

From: Dmitry Osipenko
Date: Tue Aug 03 2021 - 11:35:03 EST


22.06.2021 20:04, Dmitry Osipenko пишет:
> 18.06.2021 23:36, Dmitry Osipenko пишет:
>> Hi,
>>
>> I'm getting a hang on resume from suspend using today's next-20210618.
>> It's tested on Tegra20 Acer A500 that has older BCM4329, but seems the
>> problem is generic.
>>
>> There is this line in pstore log:
>>
>> ieee80211 phy0: brcmf_netdev_start_xmit: xmit rejected state=0
>>
>> Steps to reproduce:
>>
>> 1. Boot system
>> 2. Connect WiFi
>> 3. Run "rtcwake -s10 -mmem"
>>
>> What's interesting is that turning WiFi off/on before suspending makes
>> resume to work and there are no suspicious messages in KMSG, all further
>> resumes work too.
>>
>> This change fixes the hang:
>>
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>> index db5f8535fdb5..06d16f7776c7 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
>> @@ -301,7 +301,6 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct
>> sk_buff *skb,
>> /* Can the device send data? */
>> if (drvr->bus_if->state != BRCMF_BUS_UP) {
>> bphy_err(drvr, "xmit rejected state=%d\n", drvr->bus_if->state);
>> - netif_stop_queue(ndev);
>> dev_kfree_skb(skb);
>> ret = -ENODEV;
>> goto done;
>> 8<---
>>
>> Comments? Suggestions? Thanks in advance.
>>
>
> Update:
>
> After some more testing I found that the removal of netif_stop_queue() doesn't really help, apparently it was a coincidence.
>
> I enabled CONFIG_BRCMDBG and added dump_stack() to the error condition of brcmf_netdev_start_xmit() and this is what it shows:
>
> PM: suspend entry (s2idle)
> Filesystems sync: 0.000 seconds
> Freezing user space processes ... (elapsed 0.004 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
...
The hanging problem has been resolved by bumping Tegra SoC core voltage,
so it wasn't related to BCM.

The "xmit rejected" error is still there, but it's not fatal AFAICS.