Re: [RFC PATCH] x86: Do not panic if mce=2 is passed

From: Borislav Petkov
Date: Sun Sep 18 2016 - 14:39:15 EST


On Fri, Sep 16, 2016 at 08:28:44PM +0000, Luck, Tony wrote:
> > For UE recovery support, current we need mce=2 in command line
> > and also disable panic_on_oops with sysctl.
>
> Please explain. I've never given mce=2 on command line, and have
> had my kernel recover from thousands of (injected) UE memory errors.

So frankly, that panic_on_oops doesn't make a whole lotta sense to me.

It is promoting MCEs with severity MCE_UC_SEVERITY and higher to a
panic.

So let's look at those:

MCE_UC_SEVERITY, - we don't do anything special in the kernel for
those so just as well.
MCE_AR_SEVERITY, - those end up in the memory failure code if
they're memory errors
MCE_PANIC_SEVERITY, - causes panic

so if anything, panic_on_oops shouldn't control the panicking behavior
as tolerant does that already:

* Tolerant levels:
* 0: always panic on uncorrected errors, log corrected errors
* 1: panic or SIGBUS on uncorrected errors, log corrected errors
* 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
* 3: never panic or SIGBUS, log all errors (for testing only)

IOW, I think that patch makes sense but please doublecheck my logic
above first.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--