Re: [git pull] drm for 6.8

From: Mario Limonciello
Date: Wed Jan 24 2024 - 11:42:08 EST


On 1/24/2024 10:24, Vlastimil Babka wrote:
On 1/24/24 16:31, Donald Carr wrote:
On Wed, Jan 24, 2024 at 7:06 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
When testing the rc1 on my openSUSE Tumbleweed desktop, I've started
experiencing "frozen desktop" (KDE/Wayland) issues. The symptoms are that
everything freezes including mouse cursor. After a while it either resolves,
or e.g. firefox crashes (if it was actively used when it froze) or it's
frozen for too long and I reboot with alt-sysrq-b. When it's frozen I can
still ssh to the machine, and there's nothing happening in dmesg.
The machine is based on Amd Ryzen 7 2700 and Radeon RX7600.

I've bisected the merge commits so far and now will try to dig into this
one. I've noticed there was also a drm fixes PR later in the merge window but
since it was also merged into rc1 and thus didn't prevent the issue for me,
I guess it's not relevant here?

Because the reproduction wasn't very deterministic I considered a commit bad
even if it didn't lead to completely frozen desktop and a forced reboot.
Even the multi-second hangs that resolved were a regression compared to 6.7
anyway.

If there are known issues and perhaps candidate fixes already, please do tell.

I am experiencing the exact same symptoms; sddm (on weston) starts
perfectly, launching a KDE wayland session freezes at various points
(leading to plenty of premature celebration), but normally on the
handoff from sddm to kde (replete with terminal cursor on screen)

Working perfectly as of the end of 6.7 final release, broken as of 6.8 rc1.
Sometimes sddm can be successfully restarted via ssh, other times
restarting sddm is slow and fails to complete.

Big thanks to Thorsten who suggested I look at the following:

https://lore.kernel.org/all/20240123021155.2775-1-mario.limonciello@xxxxxxx/

https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuOWtoeeE+q26zE+Q@xxxxxxxxxxxxxx/

Instead of further bisection I've applied Mario's revert from the first link
on top of 6.8-rc1 and the issue seems gone for me now.

Thanks for confirming. I don't think we should jump right to the revert right now. I posted it in case that is the direction we need to go (simple git revert didn't work due to contextual changes).

Let's give the folks who work on GPU scheduler some time to understand the failure and see if they can fix it.


Vlastimil

Yours sincerely,
Donald