Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")

From: Johan Hovold
Date: Wed Jan 24 2024 - 03:16:42 EST


On Tue, Jan 23, 2024 at 04:36:48PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 23, 2024 at 06:25:52PM +0100, Johan Hovold wrote:
> > On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:

> > > 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
> > > start at fixing other problems and also improving the ASPM style, so I
> > > hope somebody steps up to fix both it and the lockdep issue. I
> > > haven't looked at it enough to have a preference for *how* to fix it.
> >
> > Ok, but since you were the one introducing the locking regression in
> > 6.7-final shouldn't you look into fixing it?
> >
> > Especially if there were alternatives to restoring the offending commit
> > which would solve the underlying issue for the resume failure without
> > breaking other platforms.
>
> Did somebody propose an alternate patch? If so, I missed it, but we
> could look at it now.

I've only skimmed the discussion leading up to the revert, but I got the
impression that other alternatives were looked at as it was still not
clear what the underlying issue actually was.

As Michael and Thorsten pointed out before the revert, it may have been
better not to do a last minute revert of a 16 month old commit which
risks introducing regressions (and brought back another sysfs issue
IIUC) before fully understanding what is really going on here.

> > I don't want to spend more time on this if the offending commit could
> > simply be reverted.
>
> I don't quite follow. By simply reverting, do you mean to revert
> f93e71aea6c6 ("Revert "PCI/ASPM: Remove
> pcie_aspm_pm_state_change()"")? IIUC that would break Michael's
> machine again.

Right, at least until that issue is fully understood and alternative
fixes have been considered.

If that's not an option, we need to rework core to pass a flag through
more than one layer to indicate whether pcie_aspm_pm_state_change()
should take the bus semaphore or not. I'd rather not do that if it can
be avoided.

Johan