Re: regression/bisected/6.8 commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b prevents the system going into suspend mode

From: Alexey Dobriyan
Date: Fri Mar 08 2024 - 01:15:57 EST


On Fri, Mar 08, 2024 at 02:22:03AM +0500, Mikhail Gavrilov wrote:
> on one of my systems, commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b
> prevents the system going into suspend mode.

> Every time when I tried switch to suspend mode I saw this messages in the log:
> [ 117.596548] xhci_hcd 0000:12:00.3: PM: pci_pm_suspend():
> hcd_pci_suspend+0x0/0x20 returns -16
> [ 117.596569] xhci_hcd 0000:12:00.3: PM: dpm_run_callback():
> pci_pm_suspend+0x0/0x4e0 returns -16
> [ 117.596583] xhci_hcd 0000:12:00.3: PM: failed to suspend async: error -16
> [ 118.295894] PM: Some devices failed to suspend, or early wake event detected
> [ 118.301032] xhci_hcd 0000:10:00.0: xHC error in resume, USBSTS 0x401, Reinit
> [ 118.301129] usb usb1: root hub lost power or was reset
> [ 118.301132] usb usb2: root hub lost power or was reset
> [ 118.301868] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
> [ 118.302115] [drm] PSP is resuming...
> [ 118.336045] [drm] reserve 0x1300000 from 0x85fc000000 for PSP TMR
> [ 118.374741] xone-dongle 3-1.1:1.0: xone_mt76_resume_radio: resumed
> [ 118.377527] nvme nvme0: 31/0/0 default/read/poll queues
> [ 118.379470] nvme nvme1: 32/0/0 default/read/poll queues
> [ 118.493231] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode
> is not available
> [ 118.493237] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY:
> securedisplay ta ucode is not available
> [ 118.493241] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
> [ 118.493245] amdgpu 0000:03:00.0: amdgpu: smu driver if version =
> 0x0000003d, smu fw if version = 0x0000003f, smu fw program = 0, smu fw
> version = 0x004e7900 (78.121.0)
> [ 118.493248] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
> [ 118.609941] ata3: SATA link down (SStatus 0 SControl 300)
> [ 118.610052] ata4: SATA link down (SStatus 0 SControl 300)
> [ 118.610154] ata2: SATA link down (SStatus 0 SControl 300)
> [ 118.610174] ata1: SATA link down (SStatus 0 SControl 300)
> [ 118.690018] usb 1-12: reset high-speed USB device number 4 using xhci_hcd
> [ 119.067818] usb 1-10: reset high-speed USB device number 3 using xhci_hcd
> [ 119.442726] usb 1-6: reset full-speed USB device number 2 using xhci_hcd
> [ 122.034768] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
> your previous command: SMN_C2PMSG_66:0x00000006
> SMN_C2PMSG_82:0x00000000
> [ 122.034779] amdgpu 0000:03:00.0: amdgpu: Failed to enable requested
> dpm features!
> [ 122.034780] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
> [ 122.034782] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
> resume of IP block <smu> failed -62
> [ 122.034975] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume
> failed (-62).
> [ 122.034984] amdgpu 0000:03:00.0: PM: dpm_run_callback():
> pci_pm_resume+0x0/0x200 returns -62
> [ 122.034990] amdgpu 0000:03:00.0: PM: failed to resume async: error -62
> [ 122.042111] OOM killer enabled.
> [ 122.042115] Restarting tasks ... done.
>
> So I tried to find which commit borked it.
> And I successfully found it:
>
> 5d390df3bdd13d178eb2e02e60e9a480f7103f7b is the first bad commit
> commit 5d390df3bdd13d178eb2e02e60e9a480f7103f7b
> Author: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> Date: Tue Jan 23 13:40:00 2024 +0300
>
> smb: client: delete "true", "false" defines
>
> Kernel has its own official true/false definitions.
>
> The defines aren't even used in this file.
>
> Signed-off-by: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> Signed-off-by: Steve French <stfrench@xxxxxxxxxxxxx>
>
> fs/smb/client/smbencrypt.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> I am convinced that suspend mode started work after reverting commit
> 5d390df3bdd13d178eb2e02e60e9a480f7103f7b on top of 6.8-rc7.
>
> Bisect log and all kernel logs from each step I attached here.
> Also attached build config.
>
> Alexey, can you look into it?

What? Deleting unused defines breaks suspend?

Collect fs/smb/client/smbencrypt.o with and without patch and
see them being identical.

Enum in stddef.h are

enum {
false = 0,
true = 1,
};

so if defines were used somehow they would expand to same values of
same type.

Something else is going on.