Re: [PATCH RESEND] PCI/AER: Use a common function to print AER error bits

From: Alex G.
Date: Sat Apr 28 2018 - 13:07:58 EST


On 04/28/2018 11:46 AM, Alex G. wrote:
On 04/27/2018 05:43 PM, Bjorn Helgaas wrote:
On Tue, Apr 17, 2018 at 12:09:43PM -0500, Alexandru Gagniuc wrote:
(snip)
+ÂÂÂ memset(&info, 0, sizeof(info));
+ÂÂÂ info.severity = aer_severity;
+ÂÂÂ info.status = status;
+ÂÂÂ info.mask = mask;
+ÂÂÂ info.first_error = 0x1f;

I like this patch a lot, but where does this "first_error = 0x1f" come
from?

aer_(un)correctable_error_string don't go to [0x1f], so this guarantees us we don't print "(First)".

I assume this is supposed to be the "First Error Pointer" in the
Advanced Error Capabilities and Control register (PCIe r4.0, sec
7.8.4.7). There is a "cap_control" field in struct
aer_capability_regs; should we be using that here?

There is a way to extract it from the PCI regs, and it's quite simple. IIRC, it should be all f's when the capability is not implemented. I wanted to avoid any further parsing of PCI regs in this patch.

I could update the offending line to say:
+ info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);

Though I still have the concerns with validating CPER data:

I can see a way to use even more common printk code, but that requires validating the PCI regs we get from firmware. That means we need to make a guarantee about CPER that is beyond the scope of this patch.

Alex