Re: [PATCH v3 3/9] NTB: Remove pci_aer_clear_nonfatal_status() call

From: Serge Semin
Date: Tue Dec 06 2022 - 16:41:17 EST


Hi Bjorn

On Tue, Dec 06, 2022 at 12:09:56PM -0600, Bjorn Helgaas wrote:
> On Wed, Sep 28, 2022 at 02:03:55PM +0300, Serge Semin wrote:
> > On Wed, Sep 28, 2022 at 06:59:40PM +0800, Zhuo Chen wrote:
> > > There is no need to clear error status during init code, so remove it.
> >
> > Why do you think there isn't? Justify in more details.
>
> Thanks for taking a look, Sergey! I agree we should leave it or add
> the rationale here.
>
> > > Signed-off-by: Zhuo Chen <chenzhuo.1@xxxxxxxxxxxxx>
> > > ---
> > > drivers/ntb/hw/idt/ntb_hw_idt.c | 2 --
> > > 1 file changed, 2 deletions(-)
> > >
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 0ed6f809ff2e..fed03217289d 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > > ret = pci_enable_pcie_error_reporting(pdev);
> > > if (ret != 0)
> > > dev_warn(&pdev->dev, "PCIe AER capability disabled\n");
> > > - else /* Cleanup nonfatal error status before getting to init */
> > > - pci_aer_clear_nonfatal_status(pdev);
>
> I do think drivers should not need to clear errors; I think the PCI
> core should be responsible for that.
>
> And I think the core *does* do that in this path:
>
> pci_init_capabilities
> pci_aer_init
> pci_aer_clear_status
> pci_aer_raw_clear_status
> pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS)
> pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS)
>
> pci_aer_clear_nonfatal_status() clears only non-fatal uncorrectable
> errors, while pci_aer_init() clears all correctable and all
> uncorrectable errors, so the PCI core is already doing more than
> idt_init_pci() does.
>
> So I think this change is good because it removes some work from the
> driver, but let me know if you think otherwise.

It's hard to remember now all the details but IIRC back when this
driver was developed the "Unsupported Request" flag was left uncleared
on our platform even after the probe completion. Most likely an
erroneous TLP was generated by some action performed on the device
probe stage. The forced cleanup of the AER status solved that problem.
On the other hand the problem of having the UnsupReq+ flag set was
solved some time after the driver was merged in into the kernel (it
was caused by a vendor-specific behavior of the IDT PCIe switch placed
on the path between a RP and PCIe NTB). So since the original reason
of having the pci_aer_clear_nonfatal_status() method called here was
platform specific and fixed now anyway, and the AER flags cleanup is
done by the core, then I have no reason to be against the patch. It
would be good to add your clarification to the commit message though.

Reviewed-by: Serge Semin <fancer.lancer@xxxxxxxxx>

-Serge(y)

>
> > >
> > > /* First enable the PCI device */
> > > ret = pcim_enable_device(pdev);
> > > --
> > > 2.30.1 (Apple Git-130)
> > >