Re: [PATCH v2 0/5] Parse the PCIe AER and set to relevant registers

From: Bjorn Helgaas
Date: Wed Apr 12 2023 - 12:32:09 EST


On Wed, Apr 12, 2023 at 05:11:28PM +0800, LeoLiuoc wrote:
> 在 2023/4/8 7:18, Bjorn Helgaas 写道:
> > On Tue, Nov 15, 2022 at 11:11:15AM +0800, LeoLiu-oc wrote:
> > > From: leoliu-oc <leoliu-oc@xxxxxxxxxxx>
> > >
> > > According to the sec 18.3.2.4, 18.3.2.5 and 18.3.2.6 in ACPI r6.5, the
> > > register values form HEST PCI Express AER Structure should be written to
> > > relevant PCIe Device's AER Capabilities. So the purpose of the patch set
> > > is to extract register values from HEST PCI Express AER structures and
> > > program them into AER Capabilities. Refer to the ACPI Spec r6.5 for a more
> > > detailed description.
> >
> > I wasn't involved in this part of the ACPI spec, and I don't
> > understand how this is intended to work.
> >
> > I see that this series extracts AER mask, severity, and control
> > information from the ACPI HEST table and uses it to configure PCIe
> > devices as they are enumerated.
> >
> > What I don't understand is how this relates to ownership of the AER
> > capability as negotiated by the _OSC method. Firmware can configure
> > the AER capability itself, and if it retains control of the AER
> > capability, the OS can't write to it (with the exception of clearing
> > EDR error status), so this wouldn't be necessary.
>
> There is no relationship between the ownership of the AER related
> register and the ownership of the AER capability in the OS or
> Firmware.

I don't understand this; can you say it another way? "Ownership of
the AER related register" and "ownership of the AER capability" sound
exactly the same to me.

> The processing here is to initialize the AER related register, not
> the AER event. If Firmware is configured with AER register, it will
> not be able to handle the runtime hot reset and link retrain cases
> in addition to the hotplug case you mentioned below.
>
> > If the OS owns the AER capability, I assume it gets to decide for
> > itself how to configure AER, no matter what the ACPI HEST says.
>
> What information does the OS use to decide how to configure AER? The
> ACPI Spec has the following description: PCI Express (PCIe) root
> ports may implement PCIe Advanced Error Reporting (AER) support.
> This table(HEST) contains information platform firmware supplies to
> OSPM for configuring AER support on a given root port. We understand
> that HEST stands for user to express expectations.
>
> In the current implementation, the OS already configures a PCIE
> device based on _HPP/_HPX method when configuring a PCI device
> inserted into a hot-plug slot or initial configuration of a PCI
> device at system boot. HEST is just another way to express the
> desired configuration of the user.

Why was the HEST mechanism added if the functionality is equivalent
to the existing _HPP/_HPX? There must be something that HEST supplies
that _HPP/_HPX did not.

I think we need some things in the commit log (and short comments in
the code) to help maintain this in the future:

- What problem does this solve, e.g., is there some bug that happens
because we lack this functionality?

- How is this HEST mechanism related to _HPP/_HPX? What are the
differences?

- How is this related to _OSC AER ownership?

I think we ignore _OSC ownership in the existing _HPP/_HPX code, but
that seems like a potential problem. The PCI Firmware spec (r3.3, sec
4.5.1) is pretty clear:

If control of this feature was requested and denied or was not
requested, firmware returns this bit set to 0, and the operating
system must not modify the Advanced Error Reporting Capability or
the other error enable/status bits listed above.

> > Maybe this is intended for the case where firmware retains AER
> > ownership but the OS uses native hotplug (pciehp), and this is a way
> > for the OS to configure new devices as the firmware expects? But in
> > that case, we still have the problem that the OS can't write to the
> > AER capability to do this configuration.
> >
> > Bjorn