Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for non-hotplug capable devices

From: Kuppuswamy, Sathyanarayanan
Date: Wed May 27 2020 - 23:59:15 EST




On 5/26/20 11:41 PM, Yicong Yang wrote:
We should do slot reset if driver required, but it's different from the `slot reset` in pci_bus_error_reset().
Previously we don't do a slot reset and call ->slot_reset() directly, I don't know the certain reason.
IIUC, your concern is whether it is correct to trigger reset for
pci_channel_io_normal case right ? Please correct me if my
assumption is incorrect.
right.

If its true, then why would report_error_detected() will return
PCI_ERS_*_NEED_RESET for pci_channel_io_normal case ? If
report_error_detected() requests reset in pci_channel_io_normal
case then I think we should give preference to it.
If we get PCI_ERS_*_NEED_RESET, we should do slot reset, no matter it's a
hotpluggable slot or not.

pci_slot_reset() function itself has dependency on hotplug ops. So
what kind of slot reset is needed for non-hotplug case?

static int pci_slot_reset(struct pci_slot *slot, int probe)
{
int rc;

if (!slot || !pci_slot_resetable(slot))
return -ENOTTY;

if (!probe)
pci_slot_lock(slot);

might_sleep();

rc = pci_reset_hotplug_slot(slot->hotplug, probe);

if (!probe)
pci_slot_unlock(slot);

return rc;
}

static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, int probe)
{
int rc = -ENOTTY;

if (!hotplug || !try_module_get(hotplug->owner))
return rc;

if (hotplug->ops->reset_slot)
rc = hotplug->ops->reset_slot(hotplug, probe);

module_put(hotplug->owner);

return rc;
}
And we shouldn't do it here in reset_link(), that's
two separate things. The `slot reset` done in aer_root_reset() is only for *link
reset*, as there may have some side effects to perform secondary bus reset directly
for hotpluggable slot, as mentioned in commit c4eed62a2143, so it use slot reset
to do the reset link things.

As for slot reset required by the driver, we should perform it later just before the
->slot_reset(). I noticed the TODO comments there and we should implement
it if it's necessary.
I agree.

It lies in line 183, drivers/pcie/err.c:

if (status == PCI_ERS_RESULT_NEED_RESET) {
/*
* TODO: Should call platform-specific
* functions to reset slot before calling
* drivers' slot_reset callbacks?
*/
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(dev, "broadcast slot_reset message\n");
pci_walk_bus(bus, report_slot_reset, &status);
}