RE: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP

From: Khandelwal, Rajat
Date: Mon Jan 02 2023 - 12:38:32 EST


Hi Paul, Sasha
Thanks for the acknowledgement!

-> Will add the example logs
-> Device: https://www.hp.com/us-en/monitors-accessories/computer-accessories/thunderbolt-G4-dock.html
-> correctible -> correctable
-> I guess acc to the convention, I still have to use #ifdef for my function since it
references variables that won't exist if the condition is not met.
However, I have used the IS_ENABLED macro to call the function inside igc_probe().
I hope that's okay!

-> One last thing, I was also skeptical on the location of this function, but then I witnessed
netxen_mask_aer_correctable() function inside net/ethernet/qlogic/netxen/netxen_nic_main.c,
which masks the correctable errors in its PCIe device.
Also, I don’t see a CONFIG_PCIEAER macro enabled function in pci/quirks.c!
I still think to keep the function in igc_main.c, but I am waiting for your judgement.

@Neftin, Sasha, I and my team prefer masking these errors rather than debugging them.
First, they are correctable and non-fatal. Second, these errors are observed in many of the devices I
have worked with (i.e., replay errors). Maybe there is something universal which has to be done for the
thunderbolt domain regarding these specific replay errors in the long term?
Anyhow, we would like to mask these errors for now to avoid any confusions when ethernet gets
connected to the dock. I hope that will be okay? Waiting for your judgement :)

Let me know on any more queries and any suggestions until I roll out v2.

Thanks
Rajat

-----Original Message-----
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Sunday, January 1, 2023 4:02 PM
To: Rajat Khandelwal <rajat.khandelwal@xxxxxxxxxxxxxxx>
Cc: Brandeburg, Jesse <jesse.brandeburg@xxxxxxxxx>; Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; pabeni@xxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; intel-wired-lan@xxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Khandelwal, Rajat <rajat.khandelwal@xxxxxxxxx>; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx
Subject: Re: [Intel-wired-lan] [PATCH] igc: Mask replay rollover/timeout errors in I225_LMVP

[Cc: +Bjorn, +linux-pci]


Dear Rajat,


Thank you for your patch.

Am 29.12.22 um 13:26 schrieb Rajat Khandelwal:
> The CPU logs get flooded with replay rollover/timeout AER errors in
> the system with i225_lmvp connected, usually inside thunderbolt devices.

Please add one example log message to the commit message.

> One of the prominent TBT4 docks we use is HP G4 Hook2, which
> incorporates

I couldn’t find that device. Is that the correct name?

> an Intel Foxville chipset, which uses the igc driver.

Please add a blank line between paragraphs.

> On connecting ethernet, CPU logs get inundated with these errors. The
> point is we shouldn't be spamming the logs with such correctible
> errors as it

correctable

> confuses other kernel developers less familiar with PCI errors,
> support staff, and users who happen to look at the logs.

Please reference the bug reports (bug tracker and mailing list), you know of, where this was reported.

> Signed-off-by: Rajat Khandelwal <rajat.khandelwal@xxxxxxxxxxxxxxx>
> ---
> drivers/net/ethernet/intel/igc/igc_main.c | 28 +++++++++++++++++++++--
> 1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> b/drivers/net/ethernet/intel/igc/igc_main.c
> index ebff0e04045d..a3a6e8086c8d 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -6201,6 +6201,26 @@ u32 igc_rd32(struct igc_hw *hw, u32 reg)
> return value;
> }
>
> +#ifdef CONFIG_PCIEAER
> +static void igc_mask_aer_replay_correctible(struct igc_adapter
> +*adapter)

correctable

> +{
> + struct pci_dev *pdev = adapter->pdev;
> + u32 aer_pos, corr_mask;

Instead of using the preprocessor, use a normal C conditional. From
`Documentation/process/coding-style.rst`:

> Within code, where possible, use the IS_ENABLED macro to convert a
> Kconfig symbol into a C boolean expression, and use it in a normal C conditional:
>
> .. code-block:: c
>
> if (IS_ENABLED(CONFIG_SOMETHING)) {
> ...
> }
>
> The compiler will constant-fold the conditional away, and include or
> exclude the block of code just as with an #ifdef, so this will not add
> any runtime overhead. However, this approach still allows the C
> compiler to see the code inside the block, and check it for
> correctness (syntax, types, symbol references, etc). Thus, you still
> have to use an #ifdef if the code inside the block references symbols that will not exist if the condition is not met.


> +
> + if (pdev->device != IGC_DEV_ID_I225_LMVP)
> + return;
> +
> + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
> + if (!aer_pos)
> + return;
> +
> + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, &corr_mask);
> +
> + corr_mask |= PCI_ERR_COR_REP_ROLL | PCI_ERR_COR_REP_TIMER;
> + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_MASK, corr_mask);
> +} #endif
> +
> /**
> * igc_probe - Device Initialization Routine
> * @pdev: PCI device information struct @@ -6236,8 +6256,6 @@ static
> int igc_probe(struct pci_dev *pdev,
> if (err)
> goto err_pci_reg;
>
> - pci_enable_pcie_error_reporting(pdev);
> -
> err = pci_enable_ptm(pdev, NULL);
> if (err < 0)
> dev_info(&pdev->dev, "PCIe PTM not supported by PCIe
> bus/controller\n"); @@ -6272,6 +6290,12 @@ static int igc_probe(struct pci_dev *pdev,
> if (!adapter->io_addr)
> goto err_ioremap;
>
> +#ifdef CONFIG_PCIEAER
> + igc_mask_aer_replay_correctible(adapter);
> +#endif
> +
> + pci_enable_pcie_error_reporting(pdev);
> +
> /* hw->hw_addr can be zeroed, so use adapter->io_addr for unmap */
> hw->hw_addr = adapter->io_addr;
>


Kind regards,

Paul