Re: [PATCH v1 3/3] PCI: qcom: Add support for detecting controller level PCIe errors

From: Bjorn Helgaas
Date: Wed Feb 21 2024 - 13:50:32 EST


On Wed, Feb 21, 2024 at 07:34:04PM +0530, root wrote:
> From: Nitesh Gupta <nitegupt@xxxxxxxxxxx>
>
> Synopsys Controllers provide capabilities to detect various controller

"Synopsys controllers"? "Synopsys" refers to the DesignWare core, but
most of this code is in the qcom driver. If it's qcom-specific, this
should say "Qualcomm controllers".

> level errors. These can range from controller interface error to random
> PCIe configuration errors. This patch intends to add support to detect
> these errors and report it to userspace entity via sysfs, which can take
> appropriate actions to mitigate the errors.

s/This patch intends to add/Add/, so the commit log says what the
patch *does*, not "what it intends to do".

> +
> +/*
> + * Error Reporting DBI register
> + */

Typical style in this file (granted, it's not 100% consistent) is to
make these single-line comments, i.e.,

/* Error Reporting DBI register */

> +#define DBI_DEVICE_CONTROL_DEVICE_STATUS 0x78
> +#define DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG 0x8c

Most other #defines in this file use upper-case hex.

> +#define PCIE_AER_EXT_CAP_ID 0x01

Why not the existing PCI_EXT_CAP_ID_ERR? If this is the standard PCIe
AER stuff, we shouldn't make it needlessly device-specific.

> +#define PCI_EXT_CAP_RASDP_ID 0x0b

Looks like possibly PCI_EXT_CAP_ID_VNDR? Capability IDs are
definitely not device-specific. The fact that a PCI_EXT_CAP_ID_VNDR
capability in a device with Vendor ID PCI_VENDOR_ID_QCOM has a
qcom-specific meaning is obviously specific to qcom, but the
Capability ID itself is not.

> +/* DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG register fields */
> +#define PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN BIT(0)
> +#define PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN BIT(1)
> +#define PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN BIT(2)
> +
> +/* DBI_DEVICE_CONTROL_DEVICE_STATUS register fields */
> +#define PCIE_CAP_UNSUPPORT_REQ_REP_EN BIT(3)
> +#define PCIE_CAP_FATAL_ERR_REPORT_EN BIT(2)
> +#define PCIE_CAP_NON_FATAL_ERR_REPORT_EN BIT(1)
> +#define PCIE_CAP_CORR_ERR_REPORT_EN BIT(0)

These look like alternate ways to access the generic PCIe Capability.
If that's the case, either use the existing PCI_EXP_RTCTL_SECEE,
PCI_EXP_DEVCTL_CERE, etc., or at least match the "RTCTL_SECEE" parts
of the names so we can see the connection.

> +/* DBI_ADV_ERR_CAP_CTRL_OFF register fields */
> +#define ECRC_GEN_EN BIT(6)
> +#define ECRC_CHECK_EN BIT(8)

Do these correspond to PCI_ERR_CAP_ECRC_GENE, PCI_ERR_CAP_ECRC_CHKE?

> +/* DBI_ROOT_ERR_CMD_OFF register fields */
> +#define CORR_ERR_REPORTING_EN BIT(0)
> +#define NON_FATAL_ERR_REPORTING_EN BIT(1)
> +#define FATAL_ERR_REPORTING_EN BIT(2)

PCI_ERR_ROOT_CMD_COR_EN, etc?

> +static void qcom_pcie_enable_error_reporting_2_7_0(struct qcom_pcie *pcie)
> +{
> + ...

> + val = readl(pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS);
> + val |= (PCIE_CAP_CORR_ERR_REPORT_EN | PCIE_CAP_NON_FATAL_ERR_REPORT_EN |
> + PCIE_CAP_FATAL_ERR_REPORT_EN | PCIE_CAP_UNSUPPORT_REQ_REP_EN);
> + writel(val, pci->dbi_base + DBI_DEVICE_CONTROL_DEVICE_STATUS);

Is there any way to split the AER part (specified by the PCIe spec)
from the qcom-specific (or dwc-specific) part? This looks an awful
lot like pci_enable_pcie_error_reporting(), and we should do this in
the PCI core in a generic way if possible.

> + val = readl(pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG);
> + val |= (PCIE_CAP_SYS_ERR_ON_CORR_ERR_EN | PCIE_CAP_SYS_ERR_ON_NON_FATAL_ERR_EN |
> + PCIE_CAP_SYS_ERR_ON_FATAL_ERR_EN);
> + writel(val, pci->dbi_base + DBI_ROOT_CONTROL_ROOT_CAPABILITIES_REG);

Bjorn