Re: [PATCH 2/3] x86/mce: Fix incorrect "Machine check from unknown source" message

From: Raj, Ashok
Date: Tue May 29 2018 - 14:22:27 EST


On Mon, May 28, 2018 at 10:49:23PM +0200, Borislav Petkov wrote:
> On Fri, May 25, 2018 at 02:41:55PM -0700, Tony Luck wrote:
> > @@ -1287,12 +1292,17 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> > no_way_out = worst >= MCE_PANIC_SEVERITY;
> > } else {
> > /*
> > - * Local MCE skipped calling mce_reign()
> > - * If we found a fatal error, we need to panic here.
> > + * If there was a fatal machine check we should have
> > + * already called mce_panic earlier in this function.
> > + * Since we re-read the banks, we might have found
> > + * something new. Check again to see if we found a
> > + * fatal error. We call "mce_severity()" again to
> > + * make sure we have the right "msg".
> > */
> > - if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
> > - mce_panic("Machine check from unknown source",
> > - NULL, NULL);
> > + if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
> > + severity = mce_severity(&m, cfg->tolerant, &msg, true);
> > + mce_panic("Local fatal machine check!", &m, msg);

If this doesn't affect mcelog parsing, would it make sense to change this from
"fatal" -> "Unrecoverable".. Fatal typically screams PCC=1 for x86, but
some of these cases are its Software recoverable, but just that Kernel
isn't able to perform recovery.


>
> Haha, this would still make you look at the code to remember was it
> "fatal local" or "local fatal" the second one. Yeah, there's the "!" but
> still.
>
> How about:
>
> "Fatal local machine check after banks scan"
>
> or so.
>
> Btw, the code in do_machine_check() has become one helluva spaghetti
> mess. It could use some clean up a bit... :)
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> --