Re: [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming

From: Ilpo Järvinen
Date: Tue Jan 30 2024 - 08:22:07 EST


On Mon, 29 Jan 2024, Bjorn Helgaas wrote:

> On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> > On runtime resume, pci_dev_wait() is called:
> > pci_pm_runtime_resume()
> > pci_pm_bridge_power_up_actions()
> > pci_bridge_wait_for_secondary_bus()
> > pci_dev_wait()
> >
> > While a device is runtime suspended along with its PCIe hierarchy, the
> > device could get disconnected. In such case, the link will not come up
> > no matter how log pci_dev_wait() waits for it.
>
> s/PCIe/PCI/ (unless this is a PCIe-specific thing)
> s/log/long/
>
> > Besides the above mentioned case, there could be other ways to get the
> > device disconnected while pci_dev_wait() is waiting for the link to
> > come up.
> >
> > Make pci_dev_wait() to exit if the device is already disconnected to
> > avoid unnecessary delay. As disconnected device is not really even a
> > failure in the same sense as link failing to come up for whatever
> > reason, return 0 instead of errno.
>
> The device being disconnected is not the same as a link failure.

This we agree and it's what I tried to write above.

> Do
> all the callers do the right thing if pci_dev_wait() returns success
> when there's no device there?

It's a bit complicated. I honestly don't know what is the best approach
here so I'm very much open to your suggestion what would be preferrable.

There are two main use cases (more than two callers but they seem boil
down to two use cases).

One use case is reset related functions and those would probably prefer to
have error returned if the wait, and thus reset, failed.

Then the another is wait for buses, that is,
pci_bridge_wait_for_secondary_bus() which return 0 if there's no device
(wait successful). For it, it would make sense to return 0 also when
device is disconnected because it seems analoguous to the case where
there's no device in the first place.

Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check
for that disconnected condition inside
pci_bridge_wait_for_secondary_bus()? To further complicate things,
however, DPC also depends on the return value of
pci_bridge_wait_for_secondary_bus() and from its perspective, returning
error when there is a device that is disconnected might be the desirable
alternative (but I'm not fully sure how everything in DPC works but I
highly suspect I'm correct here).

Either way, the fix itself does care and seemed to work regardless of the
actual value returned by pci_dev_wait().

> > Also make sure compiler does not become too clever with
> > dev->error_state and use READ_ONCE() to force a fetch for the
> > up-to-date value.
>
> I think we should have a comment there to say why READ_ONCE() is
> needed. Otherwise it's hard to know whether a future change might
> make it unnecessary.

Sure, I'll add a comment there.

--
i.