Re: Radeon regression in 6.6 kernel

From: Christian König
Date: Mon Nov 20 2023 - 10:57:44 EST


Am 19.11.23 um 07:47 schrieb Dave Airlie:
On 12.11.23 01:46, Phillip Susi wrote:
I had been testing some things on a post 6.6-rc5 kernel for a week or
two and then when I pulled to a post 6.6 release kernel, I found that
system suspend was broken. It seems that the radeon driver failed to
suspend, leaving the display dead, the wayland display server hung, and
the system still running. I have been trying to bisect it for the last
few days and have only been able to narrow it down to the following 3
commits:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
56e449603f0ac580700621a356d35d5716a62ce5
c07bf1636f0005f9eb7956404490672286ea59d3
b70438004a14f4d0f9890b3297cd66248728546c
We cannot bisect more!
Hmm, not a single reply from the amdgpu folks. Wondering how we can
encourage them to look into this.

Phillip, reporting issues by mail should still work, but you might have
more luck here, as that's where the amdgpu afaics prefer to track bugs:
https://gitlab.freedesktop.org/drm/amd/-/issues

When you file an issue there, please mention it here.

Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
comes out later today) or 6.6.2-rc1 improve things.
It would also be good to test if reverting any of these is possible or not.

Well none of the commits mentioned can affect radeon in any way. Radeon simply doesn't use the scheduler.

My suspicion is that the user is actually using amdgpu instead of radeon. The switch potentially occurred accidentally, for example by compiling amdgpu support for SI/CIK.

Those amdgpu problems for older ASIC have already been worked on and should be fixed by now.

Regards,
Christian.


File the gitlab issue and we should poke amd a but more to take a look.

Dave.