Re: [PATCH 1/2] PCI/ASPM: Disable ASPM until its LTR and L1ss state is restored

From: Bjorn Helgaas
Date: Wed Jan 13 2021 - 21:18:57 EST


On Wed, Jan 13, 2021 at 01:16:05PM +1100, Victor Ding wrote:
> On Wed, Jan 13, 2021 at 9:32 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Tue, Jan 12, 2021 at 04:02:04AM +0000, Victor Ding wrote:
> > > Right after powering up, the device may have ASPM enabled; however,
> > > its LTR and/or L1ss controls may not be in the desired states; hence,
> > > the device may enter L1.2 undesirably and cause resume performance
> > > penalty. This is especially problematic if ASPM related control
> > > registers are modified before a suspension.
> >
> > s/L1ss/L1SS/ (several occurrences) to match existing usage.
> >
> I'll update it in the next version.
> >
> > > Therefore, ASPM should disabled until its LTR and L1ss states are
> > > fully restored.
> > >
> > > Signed-off-by: Victor Ding <victording@xxxxxxxxxx>
> > > ---
> > >
> > > drivers/pci/pci.c | 11 +++++++++++
> > > drivers/pci/pci.h | 2 ++
> > > drivers/pci/pcie/aspm.c | 2 +-
> > > 3 files changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index eb323af34f1e..428de433f2e6 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1660,6 +1660,17 @@ void pci_restore_state(struct pci_dev *dev)
> > > if (!dev->state_saved)
> > > return;
> > >
> > > + /*
> > > + * Right after powering up, the device may have ASPM enabled;
> >
> > I think this could be stated more precisely. "Right after powering
> > up," ASPM should be *disabled* because ASPM Control in the Link
> > Control register powers up as zero (disabled).
> >
> Good suggestion; I'll reword in the next version.
> However, ASPM should be *disabled* for the opposite reason - ASPM Control
> in the Link Control register *may* power as non-zero (enabled).
> (More answered in the next section)
> >
> > > + * however, its LTR and/or L1ss controls may not be in the desired
> > > + * states; as a result, the device may enter L1.2 undesirably and
> > > + * cause resume performance penalty.
> > > + * Therefore, ASPM is disabled until its LTR and L1ss states are
> > > + * fully restored.
> > > + * (enabling ASPM is part of pci_restore_pcie_state)
> >
> > If we're enabling L1.2 before LTR has been configured, that's
> > definitely a bug. But I don't see how that happens. The current code
> > looks like:
> >
> > pci_restore_state
> > pci_restore_ltr_state
> > pci_write_config_word(dev, ltr + PCI_LTR_MAX_SNOOP_LAT, *cap++)
> > pci_write_config_word(dev, ltr + PCI_LTR_MAX_NOSNOOP_LAT, *cap++)
> > pci_restore_aspm_l1ss_state
> > pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++)
> > pci_restore_pcie_state
> > pcie_capability_write_word(dev, PCI_EXP_LNKCTL, cap[i++])
> >
> > We *do* restore PCI_L1SS_CTL1, which contains "ASPM L1.2 Enable",
> > before we restore PCI_EXP_LNKCTL, which contains "ASPM Control", but I
> > don't think "ASPM L1.2 Enable" by itself should be enough to allow
> > hardware to enter L1.2.
> >
> > Reading PCIe r5.0, sec 5.5.1, it seems like hardware should ignore the
> > PCI_L1SS_CTL1 bits unless ASPM L1 entry is enabled in PCI_EXP_LNKCTL.
> >
> > What am I missing?
> >
> Correct; however, PCI_EXP_LNKCTL may power up as non-zero, i.e. has
> ASPM Control enabled. BIOS may set PCI_EXP_LNKCTL (and
> PCI_L1SS_CTL1) before Kernel takes control. When BIOS does so, there
> is a brief period between powering up and Kernel restoring LTR state
> that L1.2 is enabled but LTR isn't configured.

IIUC you're saying that ASPM Control powers up as zero, but BIOS
enables ASPM before transferring control to the OS. That makes sense,
but I wouldn't describe that as "the device powering up with ASPM
enabled."

If BIOS enables L1SS (specifically, if it enables L1.2) when LTR
hasn't been configured, that sounds like a BIOS defect.