Re: [PATCH v3 1/2] PCI: pciehp: Add support for async hotplug with native AER and DPC/EDR

From: Lukas Wunner
Date: Thu Jun 22 2023 - 05:13:36 EST


On Wed, Jun 21, 2023 at 06:51:51PM +0000, Smita Koralahalli wrote:
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -292,10 +292,68 @@ void dpc_process_error(struct pci_dev *pdev)
> }
> }
>
> +static void pci_clear_surpdn_errors(struct pci_dev *pdev)
> +{
> + u16 reg16;
> + u32 reg32;
> +
> + pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, &reg32);
> + pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32);

Make this read+write conditional on "if (pdev->dpc_rp_extensions)"
as the register otherwise doesn't exist.

Wrap to 80 chars per line.


> + pci_read_config_word(pdev, PCI_STATUS, &reg16);
> + pci_write_config_word(pdev, PCI_STATUS, reg16);
> +
> + pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, PCI_EXP_DEVSTA_FED);
> +}

A code comment might be useful here saying that in practice,
Surprise Down errors have been observed to also set error bits
in the Status Register as well as the Fatal Error Detected bit
in the Device Status Register.


> +static void dpc_handle_surprise_removal(struct pci_dev *pdev)
> +{

I'm wondering if we also need

if (!pcie_wait_for_link(pdev, false)) {
pci_info(pdev, "Data Link Layer Link Active not cleared in 1000 msec\n");
goto out;
}

here, similar to dpc_reset_link() and in accordance with PCIe r6.1
sec 6.2.11:

"To ensure that the LTSSM has time to reach the Disabled state
or at least to bring the Link down under a variety of error
conditions, software must leave the Downstream Port in DPC
until the Data Link Layer Link Active bit in the Link Status
Register reads 0b; otherwise, the result is undefined."


> + if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev)) {
> + pci_err(pdev, "failed to retrieve DPC root port on async remove\n");
> + goto out;
> + }

I don't think pci_err() is needed here as dpc_wait_rp_inactive()
already emits a message. (I think I mistakenly gave the advice
to emit an error here in an earlier review comment -- sorry!)


> +
> + pci_aer_raw_clear_status(pdev);
> + pci_clear_surpdn_errors(pdev);
> +
> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
> + PCI_EXP_DPC_STATUS_TRIGGER);
> +
> +out:
> + clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
> + wake_up_all(&dpc_completed_waitqueue);
> +}
> +
> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
> +{
> + u16 status;
> +
> + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);

Wrap to 80 chars.


> +
> + if (!pdev->is_hotplug_bridge)
> + return false;

Move this if-clause to the beginning if the function so that
you omit the unnecessary register read if it's not a hotplug
bridge.


> +
> + if (!(status & PCI_ERR_UNC_SURPDN))
> + return false;
> +
> + return true;
> +}
> +
> static irqreturn_t dpc_handler(int irq, void *context)
> {
> struct pci_dev *pdev = context;
>
> + /*
> + * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
> + * removal might be unexpected, errors might be reported as a side
> + * effect of the event and software should handle them as an expected
> + * part of this event.
> + */

I think the usual way to reference the spec is "PCIe r6.0 sec 6.7.6".

Maybe that's just me but I find the code comment a little difficult
to parse. Maybe something like the following?

/*
* According to PCIe r6.0 sec 6.7.6, errors are an expected side effect
* of async removal and should be ignored by software.
*/

Thanks,

Lukas

> + if (dpc_is_surprise_removal(pdev)) {
> + dpc_handle_surprise_removal(pdev);
> + return IRQ_HANDLED;
> + }
> +
> dpc_process_error(pdev);
>
> /* We configure DPC so it only triggers on ERR_FATAL */
> --
> 2.17.1
>