Re: [PATCH 1/2] PCI/ASPM: Disable ASPM until its LTR and L1ss state is restored

From: Victor Ding
Date: Thu Jan 14 2021 - 04:15:37 EST


On Thu, Jan 14, 2021 at 7:54 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Wed, Jan 13, 2021 at 01:16:05PM +1100, Victor Ding wrote:
> > On Wed, Jan 13, 2021 at 9:32 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Tue, Jan 12, 2021 at 04:02:04AM +0000, Victor Ding wrote:
> > > > Right after powering up, the device may have ASPM enabled; however,
> > > > its LTR and/or L1ss controls may not be in the desired states; hence,
> > > > the device may enter L1.2 undesirably and cause resume performance
> > > > penalty. This is especially problematic if ASPM related control
> > > > registers are modified before a suspension.
> > >
> > > s/L1ss/L1SS/ (several occurrences) to match existing usage.
> > >
> > I'll update it in the next version.
> > >
> > > > Therefore, ASPM should disabled until its LTR and L1ss states are
> > > > fully restored.
> > > >
> > > > Signed-off-by: Victor Ding <victording@xxxxxxxxxx>
> > > > ---
> > > >
> > > > drivers/pci/pci.c | 11 +++++++++++
> > > > drivers/pci/pci.h | 2 ++
> > > > drivers/pci/pcie/aspm.c | 2 +-
> > > > 3 files changed, 14 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index eb323af34f1e..428de433f2e6 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -1660,6 +1660,17 @@ void pci_restore_state(struct pci_dev *dev)
> > > > if (!dev->state_saved)
> > > > return;
> > > >
> > > > + /*
> > > > + * Right after powering up, the device may have ASPM enabled;
> > >
> > > I think this could be stated more precisely. "Right after powering
> > > up," ASPM should be *disabled* because ASPM Control in the Link
> > > Control register powers up as zero (disabled).
> > >
> > Good suggestion; I'll reword in the next version.
> > However, ASPM should be *disabled* for the opposite reason - ASPM Control
> > in the Link Control register *may* power as non-zero (enabled).
> > (More answered in the next section)
> > >
> > > > + * however, its LTR and/or L1ss controls may not be in the desired
> > > > + * states; as a result, the device may enter L1.2 undesirably and
> > > > + * cause resume performance penalty.
> > > > + * Therefore, ASPM is disabled until its LTR and L1ss states are
> > > > + * fully restored.
> > > > + * (enabling ASPM is part of pci_restore_pcie_state)
> > >
> > > If we're enabling L1.2 before LTR has been configured, that's
> > > definitely a bug. But I don't see how that happens. The current code
> > > looks like:
> > >
> > > pci_restore_state
> > > pci_restore_ltr_state
> > > pci_write_config_word(dev, ltr + PCI_LTR_MAX_SNOOP_LAT, *cap++)
> > > pci_write_config_word(dev, ltr + PCI_LTR_MAX_NOSNOOP_LAT, *cap++)
> > > pci_restore_aspm_l1ss_state
> > > pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++)
> > > pci_restore_pcie_state
> > > pcie_capability_write_word(dev, PCI_EXP_LNKCTL, cap[i++])
> > >
> > > We *do* restore PCI_L1SS_CTL1, which contains "ASPM L1.2 Enable",
> > > before we restore PCI_EXP_LNKCTL, which contains "ASPM Control", but I
> > > don't think "ASPM L1.2 Enable" by itself should be enough to allow
> > > hardware to enter L1.2.
> > >
> > > Reading PCIe r5.0, sec 5.5.1, it seems like hardware should ignore the
> > > PCI_L1SS_CTL1 bits unless ASPM L1 entry is enabled in PCI_EXP_LNKCTL.
> > >
> > > What am I missing?
> > >
> > Correct; however, PCI_EXP_LNKCTL may power up as non-zero, i.e. has
> > ASPM Control enabled. BIOS may set PCI_EXP_LNKCTL (and
> > PCI_L1SS_CTL1) before Kernel takes control. When BIOS does so, there
> > is a brief period between powering up and Kernel restoring LTR state
> > that L1.2 is enabled but LTR isn't configured.
>
> IIUC you're saying that ASPM Control powers up as zero, but BIOS
> enables ASPM before transferring control to the OS. That makes sense,
> but I wouldn't describe that as "the device powering up with ASPM
> enabled."
>
Correct.
>
> If BIOS enables L1SS (specifically, if it enables L1.2) when LTR
> hasn't been configured, that sounds like a BIOS defect.
>
Very good point. I'll withdraw this patch then.