Re: PCI: pci_restore_state() is returning 0 when it fails

From: Breno Leitao
Date: Mon Nov 16 2009 - 09:13:56 EST


Hi Rafael,

Rafael J. Wysocki wrote:
> On Friday 13 November 2009, Breno Leitao wrote:
>> Actually pci_restore_state() is returning 0 if the restore process
>> fails, instead of a error value.
>>
>> If it fails, I believe that it should return -EPERM, once that
>> it is an invalid operation and probably pci_save_state() wasn't
>> called.
>
> I believe this patch will break a number of things.
Well, I checked it, and found that there are around 10 places that
really verify the return value for this function, and almost all of them
do the correct thing, and the patch doesn't seem to break any of them
except a specific case in the drivers/net/sfc/falcon.c file, that contains:

if (FALCON_IS_DUAL_FUNC(efx)) {
rc = pci_restore_state(nic_data->pci_dev2);
if (rc) {
EFX_ERR(efx, "failed to restore PCI config for "
"the secondary function\n");
goto fail3;
}
}
rc = pci_restore_state(efx->pci_dev);
if (rc) {
EFX_ERR(efx, "failed to restore PCI config for the "
"primary function\n");
goto fail4;


In this case, if FALCON_IS_DUAL_FUNC(efx) returns true, then the next
pci_restore_state(efx->pci_dev) will return -1, and cause the "failed to
restore PCI config for the primary function" error.
That's because the code is calling pci_restore_state() twice without calling
pci_save_state() in the middle.
Since this seems to be the only place that will be broken, and the fix is
trivial, I believe that the patch can be applied smoothly.

> Does it actually fix any problem you have observed?
No, but we use this function to resume drivers after PPC machines detects
errors (EEH). And before your patch, we basically save the state once
(in the init function), and then call pci_restore_state() every time the
machine detects an error. After your patch, it's not possible anymore,
because pci_restore_state() will not restore the state after the
first restore. So, I'd like to instrument some drivers to check if it's
possible to recover or not.

I also believe that returning 0 for a failed function isn't a good plan.

Thanks for the review,
Breno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/