Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

From: Kai-Heng Feng
Date: Wed Aug 11 2021 - 01:06:47 EST


On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <lukas@xxxxxxxxx> wrote:
>
> On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote:
> > On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > > If PME is not granted to the OS, the only consequence is that the PME
> > > port service is not instantiated at the root port. But PME is still
> > > enabled for downstream devices. Maybe that's a mistake? I think the
> > > ACPI spec is a little unclear what to do if PME control is *not* granted.
> > > It only specifies what to do if PME control is *granted*:
> >
> > So do you prefer to just disable runtime PM for the downstream device?
>
> I honestly don't know. I was just wondering whether it is okay
> to enable PME on devices if control is not granted by the firmware.
> The spec is fairly vague. But I guess the idea is that enabling PME
> on devices is correct, just handling the interrupts is done by firmware
> instead of the OS.

Does this imply that current ACPI doesn't handle this part?

>
> In your case, the endpoint device claims it can signal PME from D3cold,
> which is why we allow the root port above to runtime suspend to D3hot.
> The lspci output you've attached to the bugzilla indicates that yes,
> signaling PME in D3cold does work, but the PME interrupt is neither
> handled by the OS (because it's not allowed to) nor by firmware.
>
> So you would like to rely on PME polling instead, which only works if the
> root port remains in D0. Otherwise config space of the endpoint device
> is inaccessible.

The Windows approach is to make the entire hierarchy stays at D0, I
think maybe it's a better way than relying on PME polling.

>
> I think the proper solution is that firmware should handle the PME
> interrupt. You've said the vendor objects because they found PME
> doesn't work reliably.

The PME works, what vendor said is that enabling PME makes the system
"unstable".

> Well in that case the endpoint device shouldn't
> indicate that it can signal PME, at least not from D3cold. Perhaps
> the vendor is able to change the endpoint device's config space so
> that it doesn't claim to support PME?

This is not an viable option, and we have to consider that BIOS from
different vendors can exhibit the same behavior.

>
> If that doesn't work and thus a kernel patch is necessary, the next
> question is whether changing core code is the right approach.

I really don't see other way because non-granted PME is a system-wide thing...

>
> If you do want to change core code, I'd suggest modifying
> pci_dev_check_d3cold() so that it blocks runtime PM on upstream
> bridges if PME is not handled natively AND firmware failed to enable
> the PME interrupt at the root port. The rationale is that upstream
> bridges need to remain in D0 so that PME polling is possible.

How do I know that firmware failed to enable PME IRQ?

And let me see how to make pci_dev_check_d3cold() work for this case.

>
> An alternative would be a quirk for this specific laptop which clears
> pdev->pme_support.

This won't scale, because many models are affected.

Kai-Heng

>
> Thanks,
>
> Lukas