RE: [PATCH v2] PCI: Fix up L1SS capability for Intel Apollolake PCIe bridge

From: Lee, Ron
Date: Fri Dec 16 2022 - 11:29:52 EST


> On Thu, Dec 15, 2022 at 05:13:57PM +0800, Ron Lee wrote:
> > On Google Coral and Reef family chromebooks, the PCIe bridge lost its
> > L1 PM Substates capability after resumed from D3cold, and identify
> > that the pointer to the this capability and capapability header are
> > missing from the capability list.
>
> s/chromebooks/Chromebooks/
> s/to the this/this/
> s/capapability/capability/
I will submit patch v3 to correct these errors.
>
> This should say what problem we're solving. I assume some devices used L1 PM
> Substates before suspend, but after resume they do not, so the user-visible
> effect is that battery life is worse after resume.
This bug has existed since these series of Chromebooks was shipping, it seems no harm for system execution,
and we didn't identified battery life drop after resume. However we still expect this issue could be solved and
follow spec criteria as per PCIe spec rev6.0, section 5.5.4 L1 PM Substates Configuration

An L1 PM Substate enable bit must only be Set in the Upstream and Downstream Ports on a Link when the
corresponding supported capability bit is Set by both the Upstream and Downstream Ports on that Link, otherwise the
behavior is undefined

>
> > Capabilities: [150 v0] Null
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ ...
> > PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > T_CommonMode=40us LTR1.2_Threshold=98304ns
> > L1SubCtl2: T_PwrOn=60us
>
> I'm not sure what this snippet is telling me. Based on the patch, I guess before
> suspend, lspci would show:
>
> Capabilities: [150 v0] Null
> Capabilities: [200 v1] L1 PM Substates
> Capabilities: [220] <some other valid capability?>
>
> but after resume, you see only:
>
> Capabilities: [150 v0] Null
>
> Right?
Yes, but there is a difference in this case.
Before suspend:
....
Capabilities: [140 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [150 v0] Null
Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
T_CommonMode=40us LTR1.2_Threshold=98304ns
L1SubCtl2: T_PwrOn=60us
Kernel driver in use: pcieport

After resume:
....
Capabilities: [140 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Kernel driver in use: pcieport

The following merged commit can save/restore the L1SubCap/L1SubCtl1/L1SubCtl2 registers for this bridge,
However this bridge not only lost its capability contents but also lost the link to this capability.

PCI/ASPM: Save/restore L1SS Capability for suspend/resume
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/pcie/aspm.c?h=v6.1&id=4257f7e008ea394fcecc050f1569c3503b8bcc15

>
> > This patch fix up the header and the pointer to the L1SS capability
> > after resuming from D3Cold.
>
> The main problem here is that this patch covers up an issue without saying what
> the root cause is. Presumably this is a firmware issue.
> Has that been identified? Has it been fixed for future firmware releases?
This issue could be and should be fixed by BIOS, however the manufacturers have no resource for firmware validation and it's risky for firmware update per their assessment.
>
> s/D3Cold/D3cold/ to match above.
>
> Is there a bug report for this issue? Include the URL here.
>
> Is there a bug report for the firmware?
>
There is a Google's internal issue tracker for this bug, seems not available for public.
Actually this bug had a discussion on this thread, and Lukasz Majczak identified this issue on Apollo Lake platform.
https://patchwork.kernel.org/project/linux-pci/patch/20220705060014.10050-1-vidyas@xxxxxxxxxx/

> > Signed-off-by: Ron Lee <ron.lee@xxxxxxxxx>
> > Reported-by: kernel test robot <lkp@xxxxxxxxx>
> > ---
>
> Nits:
>
> - Use "Apollo Lake" to match Intel usage.
>
> - Below the "---" line, mention what changed between v1 and v2 (I
> see that you added the "#ifdef CONFIG_PCIEASPM", but you should
> save readers the effort of figuring that out).
>
> - For work-in-progress, the "Reported-by: kernel test robot" is
> pointless and I will remove it. This quirk is not fixing a bug
> reported by the robot.
>
> > drivers/pci/quirks.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 285acc4aaccc..fc959be17a9d 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5992,3 +5992,20 @@
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> > 0x9a2d, dpc_log_size); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> > 0x9a2f, dpc_log_size); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,
> > 0x9a31, dpc_log_size); #endif
> > +
> > +#ifdef CONFIG_PCIEASPM
> > +static void chromeos_fixup_apl_bridge_l1ss_capability(struct pci_dev
> > +*pdev) {
> > + if (!dmi_match(DMI_SYS_VENDOR, "Google") ||
> > + (!dmi_match(DMI_PRODUCT_FAMILY, "Google_Coral") &&
> > + !dmi_match(DMI_PRODUCT_FAMILY, "Google_Reef")))
> > + return;
> > +
> > + pci_info(pdev, "Fix up L1SS Capability\n");
> > + /* Fix up the L1SS Capability Header*/
> > + pci_write_config_dword(pdev, pdev->l1ss, (0x220 << 20) | (1 << 16) |
> > +(PCI_EXT_CAP_ID_L1SS));
>
> This looks like it adds a link to another capability at offset 0x220.
> What is that, and how do we know this is safe?
The following is the dump of this bridge config before suspend, the L1SS capability is at offset 0x200 and
it point to offset 0x220 which is a null capability. This patch just add it to keep consistent during suspend/resume.
...
150: 00 00 00 20 00 04 00 00 00 00 00 00 00 00 00 00
160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 1e 00 01 22 1f 28 28 00 0f 28 03 60 f0 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 7f 7f 7f 7f...

>
> These registers are read-only per spec (PCIe r6.0, sec 7.8.3.1), but I guess you
> have device-specific knowledge that they are writable?
>
> > + /* Fix up the pointer to L1SS Capability*/
> > + pci_write_config_dword(pdev, 0x150, pdev->l1ss << 20); }
> > +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6,
> > +chromeos_fixup_apl_bridge_l1ss_capability);
> > +#endif
> >
> > base-commit: e2ca6ba6ba0152361aa4fcbf6067db71b2c7a770
> > --
> > 2.17.1
> >