Re: [PATCH v11 1/8] PCI/ERR: Update error status after reset_link()

From: Bjorn Helgaas
Date: Fri Jan 03 2020 - 21:54:20 EST


On Fri, Jan 03, 2020 at 05:03:03PM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 1/3/20 4:34 PM, Bjorn Helgaas wrote:
> > On Thu, Dec 26, 2019 at 04:39:07PM -0800, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > >
> > > Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
> > > reset_link() to recover from fatal errors. But, if the reset is
> > > successful there is no need to continue the rest of the error recovery
> > > checks. Also, during fatal error recovery, if the initial value of error
> > > status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then
> > > even after successful recovery (using reset_link()) pcie_do_recovery()
> > > will report the recovery result as failure. So update the status of
> > > error after reset_link().
> > I like the part about updating "status" with the result of
> > reset_link(), and I split that into its own patch because it
> > seems like a fix that *can* be separated.
> >
> > But I'm not convinced that we should skip the ->slot_reset()
> > callbacks if the reset_link() was successful.
>
> If reset_link() call is successful then the result value will be
> "PCI_ERS_RESULT_RECOVERED". So even if you proceed with
> rest of the code, slot_reset() will never get called right ?

The current code:

if (state == pci_channel_io_frozen &&
reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
goto failed;
...
if (status == PCI_ERS_RESULT_NEED_RESET) {
status = PCI_ERS_RESULT_RECOVERED;
pci_walk_bus(bus, report_slot_reset, &status);

doesn't save the result of reset_link(), so if status was
PCI_ERS_RESULT_NEED_RESET and the reset succeeds, we will call
->slot_reset().

After your patch, if "state == pci_channel_io_frozen", we *never* call
->slot_reset().

Do you think that matches pci-error-recovery.rst? It doesn't seem
like it to me, but perhaps I haven't read it closely enough.

> > According to
> > Documentation/PCI/pci-error-recovery.rst, we should call
> > ->slot_reset() after completion of the reset.
> >
> > For example, rsxx_err_handler implements ->slot_reset(), but
> > not ->resume(). If we reset the device, we'll claim success and
> > return, but we won't call rsxx_slot_reset(), which does a bunch
> > of important-looking recovery stuff.
> >
> > If pci-error-recovery.rst is wrong, we should fix that (after
> > auditing all the drivers to make sure they match).
> >
> > > Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
> > > Cc: Ashok Raj <ashok.raj@xxxxxxxxx>
> > > Cc: Keith Busch <keith.busch@xxxxxxxxx>
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > > Acked-by: Keith Busch <keith.busch@xxxxxxxxx>
> > > ---
> > > drivers/pci/pcie/err.c | 10 +++++++---
> > > 1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index b0e6048a9208..53cd9200ec2c 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -204,9 +204,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> > > else
> > > pci_walk_bus(bus, report_normal_detected, &status);
> > > - if (state == pci_channel_io_frozen &&
> > > - reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> > > - goto failed;
> > > + if (state == pci_channel_io_frozen) {
> > > + status = reset_link(dev, service);
> > > + if (status != PCI_ERS_RESULT_RECOVERED)
> > > + goto failed;
> > > + goto done;
> > > + }
> > > if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> > > status = PCI_ERS_RESULT_RECOVERED;
> > > @@ -228,6 +231,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> > > if (status != PCI_ERS_RESULT_RECOVERED)
> > > goto failed;
> > > +done:
> > > pci_dbg(dev, "broadcast resume message\n");
> > > pci_walk_bus(bus, report_resume, &status);
> > > --
> > > 2.21.0
> > >
> --
> Sathyanarayanan Kuppuswamy
> Linux kernel developer
>