RE: [PATCH v2 0/8] Decode IA32/X64 CPER

From: Ghannam, Yazen
Date: Wed Feb 28 2018 - 10:12:20 EST


> -----Original Message-----
> From: Borislav Petkov [mailto:bp@xxxxxxx]
> Sent: Wednesday, February 28, 2018 3:43 AM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>; Tony Luck
> <tony.luck@xxxxxxxxx>
> Cc: linux-efi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> ard.biesheuvel@xxxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH v2 0/8] Decode IA32/X64 CPER
>
> On Mon, Feb 26, 2018 at 01:38:56PM -0600, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> >
> > This series adds decoding for the IA32/X64 Common Platform Error Record.
>
> One much more important thing I forgot about yesterday: how is
> this thing playing into our RAS reporting, x86 decoding chain, etc
> infrastructure?
>

It doesn't right now.

> Is CPER bypassing it completely and the firmware is doing everything
> now? I sure hope not.
>

CPER is the format used for BERT, etc. We'll only ever see a CPER if the
firmware creates it. And it's up to firmware policy what is shared with
the OS.

This set adds decoding for the x86 CPER format which will mostly map to
core MCA errors. There's a memory CPER format that DRAM ECC, etc. will
use, so that's not covered here. In other words, common errors like
corrected DRAM ECC won't be reported with this patch set.

Most likely, we'll only see CPERs in BERT because BERT only uses CPERs.
HEST has MCA structures that can be used. Meaning we won't see MCA
errors reported through CPERs during runtime.

So the most common scenario will probably be that the system resets,
firmware found an MCA error during boot, and firmware populates BERT
with a CPER. The error gets printed during OS boot. By the time userspace
is up the error has already be printed. The error is printed as informational
since there isn't any action for the OS or user to take.

My main reason for printing all the info is that it may be too difficult or too
late to gather that info after the fact. I think this is especially true for boot
errors, though maybe there's another way that I don't know about
(re-reading BERT later?).

> If not, it needs to tie into our infrastructure and the errors need
> to go into the decoding chain where different things look at them and
> filter them.
>

Right. I want to work on getting this more integrated with our existing
x86 infrastructure. But I don't want to wait until we figure all that out
before we have some sort of CPER decoding.

> Tony, what are your plans here?
>
> Perhaps we can finally get MCE decoding on Intel too :-)
>

Thanks,
Yazen