Re: [PATCH] PCI: pci-hyperv: Retry PCI bus D0 entry when the first attempt failed with invalid device state 0xC0000184.

From: Bjorn Helgaas
Date: Mon Apr 27 2020 - 18:51:37 EST


Please pay attention to the changelog history and make yours match:

$ git log --oneline drivers/pci/controller/pci-hyperv.c
1cf106d93245 ("PCI: hv: Introduce hv_msi_entry")
61bfd920abbf ("PCI: hv: Move retarget related structures into tlfs header")
b00f80fcfaa0 ("PCI: hv: Move hypercall related definitions into tlfs header")
067fb6c97e7e ("PCI: hv: Replace zero-length array with flexible-array member")
999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2")
f9ad0f361cf3 ("PCI: hv: Decouple the func definition in hv_dr_state from VSP message")
42c3d41832ef ("PCI: hv: Add missing kfree(hbus) in hv_pci_probe()'s error handling path")
e658a4fea8ef ("PCI: hv: Remove unnecessary type casting from kzalloc")

No period at end of subject.

On Sun, Apr 26, 2020 at 09:24:30PM +0800, Wei Hu wrote:
> In the case of kdump, the PCI device was not cleanly shut down
> before the kdump kernel starts. This causes the initial
> attempt of entering D0 state in the kdump kernel to fail with
> invalid device state 0xC0000184 returned from Hyper-V host.
> When this happens, explicitly call PCI bus exit and retry to
> enter the D0 state.
>
> Also fix the PCI probe failure path to release the PCI device
> resource properly.

This sounds like two separate fixes that should be in separate
patches?

> Signed-off-by: Wei Hu <weh@xxxxxxxxxxxxx>
> ---
> drivers/pci/controller/pci-hyperv.c | 34 ++++++++++++++++++++++++++++-
> 1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index e15022ff63e3..eb4781fa058d 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2736,6 +2736,10 @@ static void hv_free_config_window(struct hv_pcibus_device *hbus)
> vmbus_free_mmio(hbus->mem_config->start, PCI_CONFIG_MMIO_LENGTH);
> }
>
> +#define STATUS_INVALID_DEVICE_STATE 0xC0000184
> +
> +static int hv_pci_bus_exit(struct hv_device *hdev, bool hibernating);
> +
> /**
> * hv_pci_enter_d0() - Bring the "bus" into the D0 power state
> * @hdev: VMBus's tracking struct for this root PCI bus
> @@ -2748,8 +2752,10 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
> struct pci_bus_d0_entry *d0_entry;
> struct hv_pci_compl comp_pkt;
> struct pci_packet *pkt;
> + bool retry = true;
> int ret;
>
> +enter_d0_retry:
> /*
> * Tell the host that the bus is ready to use, and moved into the
> * powered-on state. This includes telling the host which region
> @@ -2780,6 +2786,30 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
> dev_err(&hdev->device,
> "PCI Pass-through VSP failed D0 Entry with status %x\n",
> comp_pkt.completion_status);
> +
> + /*
> + * In certain case (Kdump) the pci device of interest was
> + * not cleanly shut down and resource is still held on host
> + * side, the host could return STATUS_INVALID_DEVICE_STATE.
> + * We need to explicitly request host to release the resource
> + * and try to enter D0 again.
> + */
> + if (comp_pkt.completion_status == STATUS_INVALID_DEVICE_STATE &&
> + retry) {
> + ret = hv_pci_bus_exit(hdev, true);
> +
> + retry = false;
> +
> + if (ret == 0) {
> + kfree(pkt);
> + goto enter_d0_retry;
> + } else {
> + dev_err(&hdev->device,
> + "PCI bus D0 exit failed with ret %d\n",
> + ret);
> + }
> + }
> +
> ret = -EPROTO;
> goto exit;
> }
> @@ -3136,7 +3166,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>
> ret = hv_pci_allocate_bridge_windows(hbus);
> if (ret)
> - goto free_irq_domain;
> + goto exit_d0;
>
> ret = hv_send_resources_allocated(hdev);
> if (ret)
> @@ -3154,6 +3184,8 @@ static int hv_pci_probe(struct hv_device *hdev,
>
> free_windows:
> hv_pci_free_bridge_windows(hbus);
> +exit_d0:
> + (void) hv_pci_bus_exit(hdev, true);
> free_irq_domain:
> irq_domain_remove(hbus->irq_domain);
> free_fwnode:
> --
> 2.20.1
>