Re: ath11k allocation failure on resume breaking wifi until power cycle

From: Vlastimil Babka
Date: Fri Feb 23 2024 - 10:28:42 EST


On 2/22/24 06:47, Manivannan Sadhasivam wrote:
> On Wed, Feb 21, 2024 at 08:34:23AM -0800, Jeff Johnson wrote:
>> On 2/21/2024 6:39 AM, Vlastimil Babka wrote:
>> > Hi,
>> >
>> > starting with 6.8 rc series, I'm experiencing problems on resume from s2idle
>> > on my laptop, which is Lenovo T14s Gen3:
>> >
>> > LENOVO 21CRS0K63K/21CRS0K63K, BIOS R22ET65W (1.35 )
>> > ath11k_pci 0000:01:00.0: wcn6855 hw2.1
>> > ath11k_pci 0000:01:00.0: chip_id 0x12 chip_family 0xb board_id 0xff soc_id 0x400c1211
>> > ath11k_pci 0000:01:00.0: fw_version 0x1106196e fw_build_timestamp 2024-01-12 11:30 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.37
>> >
>> > The problem is an allocation failure happening on resume from s2idle. After
>> > that the wifi stops working and even a reboot won't fix it, only a
>> > poweroff/poweron cycle of the laptop.
>> >
>
> Looks like WLAN is powered down during s2idle, which doesn't make sense. I hope
> Jeff will figure out what's going on.

You mean the firmware is supposed to power it down/up transparently without
kernel involvement? Because it should be powered down to save the power, no?

But I just found out that when I build my own kernel using the distro config
as base but reduced by make localmodconfig, the "mhi mhi0: Requested to
power ON" and related messages don't occur anymore, so there's something
weird going on.

> But if you can share the dmesg after enabling the debug prints of both ath11k
> and MHI, it will help a lot.
>
> - Mani
>
>> > This is order 4 (costly order), GFP_NOIO (maybe it's originally GFP_KERNEL
>> > but we restrict to GFP_NOIO during resume) allocation, thus it's impossible
>> > to do memory compaction and the page allocator gives up. Such high-order
>> > allocations should have a fallback using smaller pages, or maybe it could at
>> > least retry once the restricted GFP_NOIO context is gone.
>> >
>> > I don't know why it never happened before 6.8, didn't spot anything obvious
>> > and it happens too unreliably to go bisect. Any idea?
>>
>> I've asked the development team to look at this, but in the interim can
>> you apply the two hibernation patchsets to see if those cleanups also
>> fix your problem:
>>
>> [PATCH 0/5] wifi: ath11k: prepare for hibernation support
>> https://lore.kernel.org/linux-wireless/20240221024725.10057-1-quic_bqiang@xxxxxxxxxxx
>>
>> [PATCH 0/3] wifi: ath11k: hibernation support
>> https://lore.kernel.org/linux-wireless/20240221030026.10553-1-quic_bqiang@xxxxxxxxxxx
>