Re: [PATCH v2 1/2] PCI/AER: Allow Native AER Host Bridges to use AER

From: Bjorn Helgaas
Date: Fri Apr 24 2020 - 19:30:26 EST


Hi Jon,

I'm glad you raised this because I think the way we handle
FIRMWARE_FIRST is really screwed up.

On Mon, Apr 20, 2020 at 03:37:09PM -0600, Jon Derrick wrote:
> Some platforms have a mix of ports whose capabilities can be negotiated
> by _OSC, and some ports which are not described by ACPI and instead
> managed by Native drivers. The existing Firmware-First HEST model can
> incorrectly tag these Native, Non-ACPI ports as Firmware-First managed
> ports by advertising the HEST Global Flag and matching the type and
> class of the port (aer_hest_parse).
>
> If the port requests Native AER through the Host Bridge's capability
> settings, the AER driver should honor those settings and allow the port
> to bind. This patch changes the definition of Firmware-First to exclude
> ports whose Host Bridges request Native AER.
>
> Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx>
> ---
> drivers/pci/pcie/aer.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index f4274d3..30fbd1f 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -314,6 +314,9 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
> if (pcie_ports_native)
> return 0;
>
> + if (pci_find_host_bridge(dev->bus)->native_aer)
> + return 0;

I hope we don't have to complicate pcie_aer_get_firmware_first() by
adding this "native_aer" check here. I'm not sure what we actually
*should* do based on FIRMWARE_FIRST, but I don't think the current
uses really make sense.

I think Linux makes too many assumptions based on the FIRMWARE_FIRST
bit. The ACPI spec really only says (ACPI v6.3, sec 18.3.2.4):

If set, FIRMWARE_FIRST indicates to the OSPM that system firmware
will handle errors from this source first.

If FIRMWARE_FIRST is set in the flags field, the Enabled field [of
the HEST AER structure] is ignored by the OSPM.

I do not see anything there about who owns the AER Capability, but
Linux assumes that if FIRMWARE_FIRST is set, firmware must own the AER
Capability. I think that's reading too much into the spec.

We already have _OSC, which *does* explicitly talk about who owns the
AER Capability, and I think we should rely on that. If firmware
doesn't want the OS to touch the AER Capability, it should decline to
give ownership to the OS via _OSC.

> if (!dev->__aer_firmware_first_valid)
> aer_set_firmware_first(dev);
> return dev->__aer_firmware_first;
> --
> 1.8.3.1
>