Re: Radeon regression in 6.6 kernel

From: Bagas Sanjaya
Date: Sun Nov 12 2023 - 06:12:18 EST


On Sat, Nov 11, 2023 at 07:46:41PM -0500, Phillip Susi wrote:
> I had been testing some things on a post 6.6-rc5 kernel for a week or
> two and then when I pulled to a post 6.6 release kernel, I found that
> system suspend was broken. It seems that the radeon driver failed to
> suspend, leaving the display dead, the wayland display server hung, and
> the system still running. I have been trying to bisect it for the last
> few days and have only been able to narrow it down to the following 3
> commits:
>
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 56e449603f0ac580700621a356d35d5716a62ce5
> c07bf1636f0005f9eb7956404490672286ea59d3
> b70438004a14f4d0f9890b3297cd66248728546c
> We cannot bisect more!

Please show the full bisect log, and also tell why these commits are
skipped.

>
> It appears that there was a late merge in the 6.6 window that originally
> forked from the -rc2, as many of the later commits that I bisected had
> that version number.
>
> I couldn't get it more narrowed down because I had to skip the
> surrounding commits because they wouldn't even boot up to a gui desktop,
> let alone try to suspend.
>
> When system suspend fails, I find the following in my syslog after I
> have to magic-sysrq reboot because the the display is dead:
>
> Nov 11 18:44:39 faldara kernel: PM: suspend entry (deep)
> Nov 11 18:44:39 faldara kernel: Filesystems sync: 0.035 seconds
> Nov 11 18:44:40 faldara kernel: Freezing user space processes
> Nov 11 18:44:40 faldara kernel: Freezing user space processes completed (elapsed 0.001 seconds)
> Nov 11 18:44:40 faldara kernel: OOM killer disabled.
> Nov 11 18:44:40 faldara kernel: Freezing remaining freezable tasks
> Nov 11 18:44:40 faldara kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> Nov 11 18:44:40 faldara kernel: printk: Suspending console(s) (use no_console_suspend to debug)
> Nov 11 18:44:40 faldara kernel: serial 00:01: disabled
> Nov 11 18:44:40 faldara kernel: e1000e: EEE TX LPI TIMER: 00000011
> Nov 11 18:44:40 faldara kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 4:0:0:0: [sdb] Stopping disk
> Nov 11 18:44:40 faldara kernel: sd 1:0:0:0: [sda] Stopping disk
> Nov 11 18:44:40 faldara kernel: sd 5:0:0:0: [sdc] Stopping disk
> Nov 11 18:44:40 faldara kernel: amdgpu: Move buffer fallback to memcpy unavailable
> Nov 11 18:44:40 faldara kernel: [TTM] Buffer eviction failed
> Nov 11 18:44:40 faldara kernel: [drm] evicting device resources failed
> Nov 11 18:44:40 faldara kernel: amdgpu 0000:03:00.0: PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x80 [amdgpu] returns -19
> Nov 11 18:44:40 faldara kernel: amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -19
> Nov 11 18:44:40 faldara kernel: amdgpu 0000:03:00.0: PM: failed to suspend async: error -19
> Nov 11 18:44:40 faldara kernel: PM: Some devices failed to suspend, or early wake event detected
> Nov 11 18:44:40 faldara kernel: xhci_hcd 0000:06:00.0: xHC error in resume, USBSTS 0x401, Reinit
> Nov 11 18:44:40 faldara kernel: usb usb3: root hub lost power or was reset
> Nov 11 18:44:40 faldara kernel: usb usb4: root hub lost power or was reset
> Nov 11 18:44:40 faldara kernel: serial 00:01: activated
> Nov 11 18:44:40 faldara kernel: nvme nvme0: 4/0/0 default/read/poll queues
> Nov 11 18:44:40 faldara kernel: ata8: SATA link down (SStatus 0 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata7: SATA link down (SStatus 0 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata1: SATA link down (SStatus 4 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata3: SATA link down (SStatus 4 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata4.00: configured for UDMA/133
> Nov 11 18:44:40 faldara kernel: OOM killer enabled.
> Nov 11 18:44:40 faldara kernel: Restarting tasks ... done.
> Nov 11 18:44:40 faldara kernel: random: crng reseeded on system resumption
> Nov 11 18:44:40 faldara kernel: PM: suspend exit
> Nov 11 18:44:40 faldara kernel: PM: suspend entry (s2idle)
> Nov 11 18:44:40 faldara systemd-networkd[384]: enp0s31f6: Gained IPv6LL
> Nov 11 18:44:40 faldara avahi-daemon[668]: Joining mDNS multicast group on interface enp0s31f6.IPv6 with address fe80::3ad5:47ff:fe0f:488a.
>
> My video card is this:
>
> 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 (rev c7) (prog-if 00 [VGA controller])
> Subsystem: Gigabyte Technology Co., Ltd Navi 23
> Flags: bus master, fast devsel, latency 0, IRQ 139
> Memory at e0000000 (64-bit, prefetchable) [size=256M]
> Memory at f0000000 (64-bit, prefetchable) [size=2M]
> I/O ports at e000 [size=256]
> Memory at f7900000 (32-bit, non-prefetchable) [size=1M]
> Expansion ROM at 000c0000 [disabled] [size=128K]
> Capabilities: [48] Vendor Specific Information: Len=08 <?>
> Capabilities: [50] Power Management version 3
> Capabilities: [64] Express Legacy Endpoint, MSI 00
> Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
> Capabilities: [150] Advanced Error Reporting
> Capabilities: [200] Physical Resizable BAR
> Capabilities: [240] Power Budgeting <?>
> Capabilities: [270] Secondary PCI Express
> Capabilities: [2a0] Access Control Services
> Capabilities: [2d0] Process Address Space ID (PASID)
> Capabilities: [320] Latency Tolerance Reporting
> Capabilities: [410] Physical Layer 16.0 GT/s <?>
> Capabilities: [440] Lane Margining at the Receiver <?>
> Kernel driver in use: amdgpu
> Kernel modules: amdgpu

Anyway, thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: 56e449603f0ac5..b70438004a14f4

--
An old man doll... just what I always wanted! - Clara

Attachment: signature.asc
Description: PGP signature