Re: [PATCH v8 2/2] PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device

From: Jim Quinlan
Date: Thu Jan 11 2024 - 13:21:13 EST


On Thu, Jan 11, 2024 at 12:28 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Mon, Nov 13, 2023 at 01:56:06PM -0500, Jim Quinlan wrote:
> > The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
> > deliberately set by the PCIe RC HW into one of three mutually exclusive
> > modes:
> >
> > "safe" -- No CLKREQ# expected or required, refclk is always provided. This
> > mode should work for all devices but is not be capable of any refclk
> > power savings.
> >
> > "no-l1ss" -- CLKREQ# is expected to be driven by the downstream device for
> > CPM and ASPM L0s and L1. Provides Clock Power Management, L0s, and L1,
> > but cannot provide L1 substate (L1SS) power savings. If the downstream
> > device connected to the RC is L1SS capable AND the OS enables L1SS, all
> > PCIe traffic may abruptly halt, potentially hanging the system.
> >
> > "default" -- Bidirectional CLKREQ# between the RC and downstream device.
> > Provides ASPM L0s, L1, and L1SS, but not compliant to provide Clock
> > Power Management; specifically, may not be able to meet the T_CLRon max
> > timing of 400ns as specified in "Dynamic Clock Control", section
> > 3.2.5.2.2 of the PCIe Express Mini CEM 2.1 specification. This
> > situation is atypical and should happen only with older devices.
> >
> > Previously, this driver always set the mode to "no-l1ss", as almost all
> > STB/CM boards operate in this mode. But now there is interest in
> > activating L1SS power savings from STB/CM customers, which requires "aspm"
> > mode.
>
> I think this should read "default" mode, not "aspm" mode, since "aspm"
> is not a mode implemented by this patch, right?

Correct.
>
>
> > In addition, a bug was filed for RPi4 CM platform because most
> > devices did not work in "no-l1ss" mode.
>
> I think this refers to bug 217276, mentioned below?

I guess you are saying I should put a footnote marker there.

>
>
> > Note that the mode is specified by the DT property "brcm,clkreq-mode". If
> > this property is omitted, then "default" mode is chosen.
> >
> > Note: Since L1 substates are now possible, a modification was made
> > regarding an internal bus timeout: During long periods of the PCIe RC HW
> > being in an L1SS sleep state, there may be a timeout on an internal bus
> > access, even though there may not be any PCIe access involved. Such a
> > timeout will cause a subsequent CPU abort.
>
> This sounds scary. If a NIC is put in L1.2, does this mean will we
> see this CPU abort if there's no traffic for a long time? What is
> needed to avoid the CPU abort?

I don't think this happens in normal practice as there are a slew of
low-level TLPs
and LTR messages that are sent on a regular basis. The only time
this timeout occured
is when a major customer was doing a hack: IIRC, their endpoint
device has to reboot itself after link-up and driver probe, so it
goes into L1.2 to execute this to reboot
and while doing so the connection is completely silent.


>
> Rega
> What does this mean for users? L1SS is designed for long periods of
> the device being idle, so this leaves me feeling that using L1SS is
> unsafe in general. Hopefully this impression is unwarranted, and all
> we need is some clarification here.


I don't think it will affect most users, if any.

Regards,
Jim Quinlan
Broadcom STB/CM



>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
> >
> > Signed-off-by: Jim Quinlan <james.quinlan@xxxxxxxxxxxx>
> > Tested-by: Florian Fainelli <florian.fainelli@xxxxxxxxxxxx>
> > ---
> > drivers/pci/controller/pcie-brcmstb.c | 96 ++++++++++++++++++++++++---
> > 1 file changed, 86 insertions(+), 10 deletions(-)
> > ...

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature