Re: [RFC HACK PATCH] PCI: dwc: layerscape: Hack around enumeration problems with Honeycomb LX2K

From: Rob Herring
Date: Fri Dec 11 2020 - 10:15:21 EST


On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
>
> I have been chasing down a problem enumerating an NVMe drive on a
> Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate
> successfully if the we are emitting lots of console messages via a UART.
> If the system is booted with `quiet` set then enumeration fails.

We really need to get away from work-arounds for device X on host Y. I
suspect they are not limited to that combination only...

How exactly does it fail to enumerate? There's a (racy) linkup check
on config accesses, is it reporting link down and skipping config
accesses?

> I guessed this would be due to the timing impact of printk-to-UART and
> tried to find out where a delay could be added to provoke a successful
> enumeration.
>
> This patch contains the results. The delay length (1ms) was selected by
> binary searching downwards until the delay was not effective for these
> devices (Honeycomb LX2K and a Western Digital WD Blue SN550).
>
> I have also included the workaround twice (conditionally compiled). The
> first change is the *latest* possible code path that we can deploy a
> delay whilst the second is the earliest place I could find.
>
> The summary is that the critical window were we are currently relying on
> a call to the console UART code can "mend" the driver runs from calling
> dw_pcie_setup_rc() in host init to just before we read the state in the
> link up callback.
>
> Signed-off-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>
> ---
>
> Notes:
> This patch is RFC (and HACK) because I don't have much clue *why* this
> patch works... merely that this is the smallest possible change I need
> to replicate the current accidental printk() workaround. Perhaps one
> could argue that RFC here stands for request-for-clue. All my
> observations and changes here are empirical and I don't know how best to
> turn them into something that is not a hack!
>
> BTW I noticed many other pcie-designware drivers take advantage
> of a function called dw_pcie_wait_for_link() in their init paths...
> but my naive attempts to add it to the layerscape driver results
> in non-booting systems so I haven't embarrassed myself by including
> that in the patch!

You need to look at what's pending for v5.11, because I reworked this
to be more unified. The ordering of init is also possibly changed. The
sequence is now like this:

dw_pcie_setup_rc(pp);
dw_pcie_msi_init(pp);

if (!dw_pcie_link_up(pci) && pci->ops->start_link) {
ret = pci->ops->start_link(pci);
if (ret)
goto err_free_msi;
}

/* Ignore errors, the link may come up later */
dw_pcie_wait_for_link(pci);


> drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c
> index f24f79a70d9a..c354904b90ef 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -22,6 +22,8 @@
>
> #include "pcie-designware.h"
>
> +#define WORKAROUND_LATEST_POSSIBLE
> +
> /* PEX1/2 Misc Ports Status Register */
> #define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4)
> #define LTSSM_STATE_SHIFT 20
> @@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci)
> struct ls_pcie *pcie = to_ls_pcie(pci);
> u32 state;
>
> + /*
> + * Strictly speaking *this* (before the ioread32) is the latest
> + * point a simple delay can be effective. If we move the delay
> + * after the ioread32 then the NVMe does not enumerate.
> + *
> + * However this function appears to be frequently called so an
> + * unconditional delay here causes noticeable delay at boot
> + * time. Hence we implement the workaround by retrying the read
> + * after a short delay if we think we might need to return false.
> + */
> +
> state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
> pcie->drvdata->ltssm_shift) &
> LTSSM_STATE_MASK;
>
> +#ifdef WORKAROUND_LATEST_POSSIBLE
> + if (state < LTSSM_PCIE_L0) {
> + /* see comment above */
> + mdelay(1);
> + state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >>
> + pcie->drvdata->ltssm_shift) &
> + LTSSM_STATE_MASK;

Side note, I really wonder about the variety of ways in DWC drivers we
have to check link state. The default is a debug register... Are the
standard PCIe registers for this really broken in these cases?

> + }
> +#endif
> +
> if (state < LTSSM_PCIE_L0)
> return 0;
>
> @@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp)
>
> dw_pcie_setup_rc(pp);

Might be related to the PORT_LOGIC_SPEED_CHANGE we do in here.

> +#ifdef WORKAROUND_EARLIEST_POSSIBLE
> + /*
> + * This is the earliest point the delay is effective.
> + * If we move it before dw_pcie_setup_rc() then the
> + * NVMe does not enumerate.
> + *
> + * 500us is too short to reliably work around the issue
> + * hence adopting 1000us here.
> + */
> + mdelay(1);
> +#endif
> +
> return 0;
> }
>
>
> base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f
> --
> 2.29.2
>