Re: [PATCH v2] PCI: cadence: Fix Gen2 Link Retraining process

From: Bjorn Helgaas
Date: Tue May 09 2023 - 14:24:24 EST


On Tue, May 09, 2023 at 12:37:31PM +0530, Siddharth Vadapalli wrote:
> Bjorn,
>
> Thank you for reviewing the patch.
>
> On 09/05/23 02:44, Bjorn Helgaas wrote:
> > On Wed, Mar 15, 2023 at 12:38:00PM +0530, Siddharth Vadapalli wrote:
> >> The Link Retraining process is initiated to account for the Gen2 defect in
> >> the Cadence PCIe controller in J721E SoC. The errata corresponding to this
> >> is i2085, documented at:
> >> https://www.ti.com/lit/er/sprz455c/sprz455c.pdf
> >>
> >> The existing workaround implemented for the errata waits for the Data Link
> >> initialization to complete and assumes that the link retraining process
> >> at the Physical Layer has completed. However, it is possible that the
> >> Physical Layer training might be ongoing as indicated by the
> >> PCI_EXP_LNKSTA_LT bit in the PCI_EXP_LNKSTA register.
> >>
> >> Fix the existing workaround, to ensure that the Physical Layer training
> >> has also completed, in addition to the Data Link initialization.
> >>
> >> Fixes: 4740b969aaf5 ("PCI: cadence: Retrain Link to work around Gen2 training defect")
> >> Signed-off-by: Siddharth Vadapalli <s-vadapalli@xxxxxx>
> >> Reviewed-by: Vignesh Raghavendra <vigneshr@xxxxxx>
> >> ---
> >> Changes from v1:
> >> 1. Collect Reviewed-by tag from Vignesh Raghavendra.
> >> 2. Rebase on next-20230315.
> >>
> >> v1:
> >> https://lore.kernel.org/r/20230102075656.260333-1-s-vadapalli@xxxxxx
> >>
> >> .../controller/cadence/pcie-cadence-host.c | 27 +++++++++++++++++++
> >> 1 file changed, 27 insertions(+)
> >>
> >> diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c b/drivers/pci/controller/cadence/pcie-cadence-host.c
> >> index 940c7dd701d6..5b14f7ee3c79 100644
> >> --- a/drivers/pci/controller/cadence/pcie-cadence-host.c
> >> +++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
> >> @@ -12,6 +12,8 @@
> >>
> >> #include "pcie-cadence.h"
> >>
> >> +#define LINK_RETRAIN_TIMEOUT HZ
> >> +
> >> static u64 bar_max_size[] = {
> >> [RP_BAR0] = _ULL(128 * SZ_2G),
> >> [RP_BAR1] = SZ_2G,
> >> @@ -77,6 +79,27 @@ static struct pci_ops cdns_pcie_host_ops = {
> >> .write = pci_generic_config_write,
> >> };
> >>
> >> +static int cdns_pcie_host_training_complete(struct cdns_pcie *pcie)
> >
> > This is kind of weird because it's named like a predicate, i.e., "this
> > function tells me whether link training is complete", but it returns
> > *zero* for success.
> >
> > This is the opposite of j721e_pcie_link_up(), which returns "true"
> > when the link is up, so code like this reads naturally:
> >
> > if (pcie->ops->link_up(pcie))
> > /* do something if the link is up */
>
> I agree. The function name can be changed to indicate that it is
> waiting for completion rather than indicating completion. If this is
> the only change, I will post a patch to fix it. On the other hand,
> based on your comments in the next section, I am thinking of an
> alternative approach of merging the current
> "cdns_pcie_host_training_complete()" function's operation as well
> into the "cdns_pcie_host_wait_for_link()" function. If this is
> acceptable, I will post a different patch and the name change patch
> won't be necessary.

Yeah, sorry, I meant to delete this part of my response after I wrote
the one below.

> >> @@ -118,6 +141,10 @@ static int cdns_pcie_retrain(struct cdns_pcie *pcie)
> >> cdns_pcie_rp_writew(pcie, pcie_cap_off + PCI_EXP_LNKCTL,
> >> lnk_ctl);
> >>
> >> + ret = cdns_pcie_host_training_complete(pcie);
> >> + if (ret)
> >> + return ret;
> >> +
> >> ret = cdns_pcie_host_wait_for_link(pcie);
> >
> > It seems a little clumsy that we wait for two things in succession:
> >
> > - cdns_pcie_host_training_complete() waits up to 1s for
> > PCI_EXP_LNKSTA_LT to be cleared
> >
> > - cdns_pcie_host_wait_for_link() waits between .9s and 1s for
> > LINK_UP_DL_COMPLETED on j721e (and not at all for other platforms)
>
> Is it acceptable to merge "cdns_pcie_host_training_complete()" into
> "cdns_pcie_host_wait_for_link()"?

That's what I'm proposing. Maybe someone who is more familiar with
Cadence would have an argument against it, but I think making it
structurally the same as dw_pcie_wait_for_link() would be a good
thing.

Bjorn