Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery() call

From: Kuppuswamy, Sathyanarayanan
Date: Tue Sep 22 2020 - 19:44:13 EST




On 9/22/20 4:33 PM, Bjorn Helgaas wrote:
On Tue, Sep 22, 2020 at 02:44:51PM -0700, Kuppuswamy, Sathyanarayanan wrote:


On 9/22/20 11:52 AM, Bjorn Helgaas wrote:
On Fri, Jul 24, 2020 at 12:07:55PM -0700, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>

Current pcie_do_recovery() implementation has following two issues:


I'm having trouble parsing this out, probably just lack of my
understanding...

1. Fatal (DPC) error recovery is currently broken for non-hotplug
capable devices. Current fatal error recovery implementation relies
on PCIe hotplug (pciehp) handler for detaching and re-enumerating
the affected devices/drivers. pciehp handler listens for DLLSC state
changes and handles device/driver detachment on DLLSC_LINK_DOWN event
and re-enumeration on DLLSC_LINK_UP event. So when dealing with
non-hotplug capable devices, recovery code does not restore the state
of the affected devices correctly.

Apparently in the hotplug case, something *does* restore the state of
affected devices?

Yes, in hotplug case, DLLSC state change handler takes over detachment
/cleanup and re-attachment of affected devices/drivers.

Where does the restore happen here? I.e., what function does this?

DLLSC link down event will remove affected devices/drivers. And link up event
will re-create all devices.

on DLLSC link down event
->pciehp_ist()
->pciehp_handle_presence_or_link_change()
->pciehp_disable_slot()
->__pciehp_disable_slot()
->remove_board()
->pciehp_unconfigure_device()

on DLLSC link up event
->pciehp_ist()
->pciehp_handle_presence_or_link_change()
->pciehp_enable_slot()
->__pciehp_enable_slot()
->board_added()
->pciehp_configure_device()



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer