Re: [PATCH v3] PCI: cadence: Fix Gen2 Link Retraining process

From: Siddharth Vadapalli
Date: Fri Jun 09 2023 - 00:16:49 EST


Hello Mani,

Thank you for reviewing this patch.

On 08/06/23 21:12, Manivannan Sadhasivam wrote:
> On Wed, Jun 07, 2023 at 02:44:27PM +0530, Siddharth Vadapalli wrote:
>> The Link Retraining process is initiated to account for the Gen2 defect in
>> the Cadence PCIe controller in J721E SoC. The errata corresponding to this
>> is i2085, documented at:
>> https://www.ti.com/lit/er/sprz455c/sprz455c.pdf
>>
>> The existing workaround implemented for the errata waits for the Data Link
>> initialization to complete and assumes that the link retraining process
>> at the Physical Layer has completed. However, it is possible that the
>> Physical Layer training might be ongoing as indicated by the
>> PCI_EXP_LNKSTA_LT bit in the PCI_EXP_LNKSTA register.
>>
>> Fix the existing workaround, to ensure that the Physical Layer training
>> has also completed, in addition to the Data Link initialization.
>>
>
> cdns_pcie_host_wait_for_link() function is called even for the non-quirky cases
> as well, so does this patch. But if your patch is only targeting the link
> retraining case, you should move the logic to cdns_pcie_retrain().

In the v2 version of this patch at:
https://lore.kernel.org/r/20230315070800.1615527-1-s-vadapalli@xxxxxx/
I had implemented it as suggested above by you. However, based on the discussion
with Bjorn at:
https://lore.kernel.org/r/20230509182416.GA1259841@bhelgaas/
it was agreed upon that waiting for two things in succession doesn't seem to be
the best way to implement it. Therefore, the cdns_pcie_host_training_complete()
function in the v2 patch is merged into the cdns_pcie_host_wait_for_link()
function in this patch.

>
>
>> Fixes: 4740b969aaf5 ("PCI: cadence: Retrain Link to work around Gen2 training defect")
>> Signed-off-by: Siddharth Vadapalli <s-vadapalli@xxxxxx>
>> Reviewed-by: Vignesh Raghavendra <vigneshr@xxxxxx>
>> ---
>>
>> Hello,
>>
>> This patch is based on linux-next tagged next-20230606.
>>
>> v2:
>> https://lore.kernel.org/r/20230315070800.1615527-1-s-vadapalli@xxxxxx/
>> Changes since v2:
>> - Merge the cdns_pcie_host_training_complete() function with the
>> cdns_pcie_host_wait_for_link() function, as suggested by Bjorn
>> for the v2 patch.
>> - Add dev_err() to notify when Link Training fails, since this is a
>> fatal error and proceeding from this point will almost always crash
>> the kernel.
>>
>> v1:
>> https://lore.kernel.org/r/20230102075656.260333-1-s-vadapalli@xxxxxx/
>> Changes since v1:
>> - Collect Reviewed-by tag from Vignesh Raghavendra.
>> - Rebase on next-20230315.
>>
>> Regards,
>> Siddharth.
>>
>> .../controller/cadence/pcie-cadence-host.c | 20 +++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c b/drivers/pci/controller/cadence/pcie-cadence-host.c
>> index 940c7dd701d6..70a5f581ff4f 100644
>> --- a/drivers/pci/controller/cadence/pcie-cadence-host.c
>> +++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
>> @@ -12,6 +12,8 @@
>>
>> #include "pcie-cadence.h"
>>
>> +#define LINK_RETRAIN_TIMEOUT HZ
>> +
>> static u64 bar_max_size[] = {
>> [RP_BAR0] = _ULL(128 * SZ_2G),
>> [RP_BAR1] = SZ_2G,
>> @@ -80,8 +82,26 @@ static struct pci_ops cdns_pcie_host_ops = {
>> static int cdns_pcie_host_wait_for_link(struct cdns_pcie *pcie)
>> {
>> struct device *dev = pcie->dev;
>> + unsigned long end_jiffies;
>> + u16 link_status;
>> int retries;
>>
>> + /* Wait for link training to complete */
>> + end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
>> + do {
>> + link_status = cdns_pcie_rp_readw(pcie, CDNS_PCIE_RP_CAP_OFFSET + PCI_EXP_LNKSTA);
>> + if (!(link_status & PCI_EXP_LNKSTA_LT))
>> + break;
>> + usleep_range(0, 1000);
>> + } while (time_before(jiffies, end_jiffies));
>> +
>> + if (!(link_status & PCI_EXP_LNKSTA_LT)) {
>> + dev_info(dev, "Link training complete\n");
>
> This info is not needed.

Sure. I will drop it in the v4 patch.

>
>> + } else {
>> + dev_err(dev, "Fatal! Link training incomplete\n");
>
> This could be, "Link retraining incomplete".

I added the word "Fatal" since Linux is almost always guaranteed to crash if the
link training doesn't complete before the PCI subsystem attempts to enumerate
the EP devices. Therefore, adding the word "Fatal" will help the users identify
what the cause of the crash is, which would otherwise be overlooked, unless the
critical nature of this error is conveyed to the user.

>
> - Mani
>
>> + return -ETIMEDOUT;
>> + }
>> +
>> /* Check if the link is up or not */
>> for (retries = 0; retries < LINK_WAIT_MAX_RETRIES; retries++) {
>> if (cdns_pcie_link_up(pcie)) {
>> --
>> 2.25.1
>>
>

--
Regards,
Siddharth.