Re: [PATCH] PCI: pciehp: Do not turn off slot if presence comes up after link

From: Bjorn Helgaas
Date: Fri Feb 08 2019 - 14:56:06 EST


On Tue, Feb 05, 2019 at 03:06:56PM -0600, Alexandru Gagniuc wrote:
> According to PCIe 3.0, the presence detect state is a logical OR of
> in-band and out-of-band presence. With this, we'd expect the presence
> state to always be asserted when the link comes up.

Do you have a PCIe 4.0 spec? I think 5.0 is about to come out, so
it'd be nice to have at least a 4.0 citation (including section
number), if you have one.

> Not all hardware follows this, and it is possible for the presence to
> come up after the link. In this case, the PCIe device would be
> erroneously disabled and re-probed. It is possible to distinguish
> between a delayed presence and a card swap by looking at the DLL state
> changed bit -- The link has to come down if the card is removed.
>
> Thus, for a device that is probed, present and has its link active, a
> lack of a link state change event guarantees we have the same device,
> and shutdown of is not needed.

s/of is/is/, I guess?

I'm hoping Lukas will chime in here; thanks for cc'ing him already.

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@xxxxxxxxx>
> ---
>
> Following some discussion in
> "PCI: hotplug: Erroneous removal of hotplug PCI devices" [1]
> It became apparent that we can't fully rely presence detect changed
> as an indicator to shutdown a device. And in PCIe 4.0 the "logical OR"
> requirement is going away, so we can use a way to slightly decouple
> presence detect and link active.
>
> I think the approach here is the simplest, and least likely to
> interfere with other mis-behaving hardware (in the PCIe 3.0 sense).
>
> [1] https://www.spinics.net/lists/linux-pci/msg79564.html

Would you mind updating this reference to the
https://lore.kernel.org/linux-pci form that doesn't depend on external
organizations?

https://www.kernel.org/lore.html has some details, but unfortunately
lacks a hint about how to construct the URL. What I do is look up the
Message-ID and use, e.g.,

https://lore.kernel.org/linux-pci/<Message-ID>

> drivers/pci/hotplug/pciehp_ctrl.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 3f3df4c29f6e..ea0dd3e956be 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -220,13 +220,23 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> /*
> * If the slot is on and presence or link has changed, turn it off.
> * Even if it's occupied again, we cannot assume the card is the same.
> + * When the card is swapped, we also expect a change in link state,
> + * without which, it's likely presence became high after link-active.
> */
> mutex_lock(&ctrl->state_lock);
> + present = pciehp_card_present(ctrl);
> + link_active = pciehp_check_link_active(ctrl);
> switch (ctrl->state) {
> case BLINKINGOFF_STATE:
> cancel_delayed_work(&ctrl->button_work);
> /* fall through */
> case ON_STATE:
> + if (present && link_active &&
> + !(events & PCI_EXP_SLTSTA_DLLSC)) {
> + mutex_unlock(&ctrl->state_lock);
> + ctrl_warn(ctrl, "Hardware bug: Presence state came up after link");

I'm not 100% sure this is worth KERN_WARN unless the user might be
able to do something about it.

> + return;
> + }
> ctrl->state = POWEROFF_STATE;
> mutex_unlock(&ctrl->state_lock);
> if (events & PCI_EXP_SLTSTA_DLLSC)
> --
> 2.19.2
>