Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")

From: Johan Hovold
Date: Tue Jan 23 2024 - 12:25:57 EST


On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:

> > I never got a reply to this one so resending with updated Subject in
> > case it got buried in your inbox.
>
> I did see it but decided it was better to fix the problem with resume
> causing an unintended reboot, even though fixing that meant breaking
> lockdep again, since I don't think we have user reports of the
> potential deadlock lockdep finds.

That may be because I fixed the previous regression in 6.7-rc1 before
any users had a chance to hit the deadlock on Qualcomm platforms.

I can easily trigger a deadlock on the X13s by instrumenting 6.7-final
with a delay to increase the race window.

And any user hitting this occasionally is likely not going to be able to
track it down to this lock inversion (unless they have lockdep enabled).

> 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
> start at fixing other problems and also improving the ASPM style, so I
> hope somebody steps up to fix both it and the lockdep issue. I
> haven't looked at it enough to have a preference for *how* to fix it.

Ok, but since you were the one introducing the locking regression in
6.7-final shouldn't you look into fixing it?

Especially if there were alternatives to restoring the offending commit
which would solve the underlying issue for the resume failure without
breaking other platforms.

I don't want to spend more time on this if the offending commit could
simply be reverted.

Johan