Re: Radeon regression in 6.6 kernel

From: Alex Deucher
Date: Mon Nov 20 2023 - 12:31:53 EST


On Mon, Nov 20, 2023 at 11:24 AM Christian König
<christian.koenig@xxxxxxx> wrote:
>
> Am 20.11.23 um 17:08 schrieb Alex Deucher:
> > On Mon, Nov 20, 2023 at 10:57 AM Christian König
> > <ckoenig.leichtzumerken@xxxxxxxxx> wrote:
> >> Am 19.11.23 um 07:47 schrieb Dave Airlie:
> >>>> On 12.11.23 01:46, Phillip Susi wrote:
> >>>>> I had been testing some things on a post 6.6-rc5 kernel for a week or
> >>>>> two and then when I pulled to a post 6.6 release kernel, I found that
> >>>>> system suspend was broken. It seems that the radeon driver failed to
> >>>>> suspend, leaving the display dead, the wayland display server hung, and
> >>>>> the system still running. I have been trying to bisect it for the last
> >>>>> few days and have only been able to narrow it down to the following 3
> >>>>> commits:
> >>>>>
> >>>>> There are only 'skip'ped commits left to test.
> >>>>> The first bad commit could be any of:
> >>>>> 56e449603f0ac580700621a356d35d5716a62ce5
> >>>>> c07bf1636f0005f9eb7956404490672286ea59d3
> >>>>> b70438004a14f4d0f9890b3297cd66248728546c
> >>>>> We cannot bisect more!
> >>>> Hmm, not a single reply from the amdgpu folks. Wondering how we can
> >>>> encourage them to look into this.
> >>>>
> >>>> Phillip, reporting issues by mail should still work, but you might have
> >>>> more luck here, as that's where the amdgpu afaics prefer to track bugs:
> >>>> https://gitlab.freedesktop.org/drm/amd/-/issues
> >>>>
> >>>> When you file an issue there, please mention it here.
> >>>>
> >>>> Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
> >>>> comes out later today) or 6.6.2-rc1 improve things.
> >>> It would also be good to test if reverting any of these is possible or not.
> >> Well none of the commits mentioned can affect radeon in any way. Radeon
> >> simply doesn't use the scheduler.
> >>
> >> My suspicion is that the user is actually using amdgpu instead of
> >> radeon. The switch potentially occurred accidentally, for example by
> >> compiling amdgpu support for SI/CIK.
> >>
> >> Those amdgpu problems for older ASIC have already been worked on and
> >> should be fixed by now.
> > In this case it's a navi23 (so radeon in the marketing sense).
>
> Thanks, couldn't find that in the mail thread.
>
> In that case those are the already known problems with the scheduler
> changes, aren't they?

Yes. Those changes went into 6.7 though, not 6.6 AFAIK. Maybe I'm
misunderstanding what the original report was actually testing. If it
was 6.7, then try reverting:
56e449603f0ac580700621a356d35d5716a62ce5
b70438004a14f4d0f9890b3297cd66248728546c

Alex

>
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >>> File the gitlab issue and we should poke amd a but more to take a look.
> >>>
> >>> Dave.
>