Re: [PATCH] PCI: Disable PTM during suspend on Intel PCI bridges

From: Rafael J. Wysocki
Date: Mon Nov 16 2020 - 12:53:23 EST


On Wed, Oct 7, 2020 at 7:10 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Wed, Oct 07, 2020 at 06:53:16PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Oct 7, 2020 at 6:49 PM David E. Box <david.e.box@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Intel Platform Controller Hubs (PCH) since Cannon Lake, the Precision
> > > Time Measurement (PTM) capability can prevent PCIe root ports from power
> > > gating during suspend-to-idle, causing increased power consumption on
> > > systems that suspend using Low Power S0 Idle [1]. The issue is yet to be
> > > root caused but believed to be coming from a race condition in the suspend
> > > flow as the incidence rate varies for different platforms on Linux but the
> > > issue does not occur at all in other operating systems. For now, disable
> > > the feature on suspend on all Intel root ports and enable again on resume.
> >
> > IMV it should also be noted that there is no particular reason why PTM
> > would need to be enabled while the whole system is suspended. At
> > least it doesn't seem to be particularly useful in that state.
>
> Is this a hardware erratum? If not, and this is working as designed,
> it sounds like we'd need to apply this quirk to every device that
> supports PTM. That's not really practical.

Why not?

It looks like the capability should be saved by pci_save_state() (it
isn't ATM, which appears to be a mistake) and restored by
pci_restore_state(), so if that is implemented, the saving can be
combined with the disabling in principle.

> The bugzilla says "there is no erratum as this does not affect
> Windows," but that doesn't answer the question. What I want to know
> is whether this is a *hardware* defect and whether it will be fixed in
> future hardware.

I cannot answer this question, sorry.

ATM we only know that certain SoCs may not enter the deepest idle
state if PTM is enabled on some PCIe root ports during suspend.

Disabling PTM on those ports while suspending helps and hence the patch.

It doesn't appear to qualify as a "hardware defect".

> If it's a "wont-fix" hardware issue, we can just disable PTM
> completely on Intel hardware and we won't have to worry about it
> during suspend.

I'm not following the logic here, sorry again.

First of all, there are systems that never suspend, so why would they
be affected by the remedy (whatever it is)?

Second, it is not about the suspend failing entirely. It's about
being able to make the system draw less power while suspended.

Generally, if someone said "I can make the system draw less power
while suspended if I disable PCIe feature X during suspend", would you
disregard that?