Re: [PATCH v2 2/2] PCI: Fix runtime PM race with PME polling

From: Lukas Wunner
Date: Tue Jan 23 2024 - 11:13:18 EST


On Tue, Jan 23, 2024 at 08:55:21AM -0700, Alex Williamson wrote:
> On Tue, 23 Jan 2024 11:45:19 +0100 Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > If the device is RPM_SUSPENDING, why immediately resume it for polling?
> > It's sufficient to poll it the next time around, i.e. 1 second later.
> >
> > Likewise, if it's already RPM_RESUMING or RPM_ACTIVE anyway, no need
> > to poll PME.
>
> I'm clearly not an expert on PME, but this is not obvious to me and
> before the commit that went in through this thread, PME wakeup was
> triggered regardless of the PM state. I was trying to restore the
> behavior of not requiring a specific PM state other than deferring
> polling across transition states.

There are broken devices which are incapable of signaling PME.
As a workaround, the kernel polls these devices once per second.
The first time the device signals PME, the kernel stops polling
that particular device because PME is clearly working.

So this is just a best-effort way to support PME for broken devices.
If it takes a little longer to detect that PME was signaled, it's not
a big deal.


> The issue I'm trying to address is that config space of the device can
> become inaccessible while calling pci_pme_wakeup() on it, causing a
> system fault on some hardware. So a gratuitous pci_pme_wakeup() can be
> detrimental.
>
> We require the device config space to remain accessible, therefore the
> instantaneous test against D3cold and that the parent bridge is in D0
> is not sufficient. I see traces where the parent bridge is in D0, but
> the PM state is RPM_SUSPENDING and the endpoint device transitions to
> D3cold while we're executing pci_pme_wakeup().

We have pci_config_pm_runtime_{get,put}() helpers to ensure the parent
of a device is in D0 so that the device's config space is accessible.
So you may need to use that in pci_pme_wakeup().

Thanks,

Lukas