RE: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

From: Luck, Tony
Date: Fri Feb 27 2015 - 13:27:27 EST


> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at
> least on Intel, according to Tony

I checked with the architects ... and I was right. If you clear CR4.MCE you'll still
see the machine check - and you'll pull the big system reset lever.

If you think the other cpus can survive the reset - then the right thing to do is to
have any offline cpus that show up in the machine check handler just clear MCG_STATUS
and return:

do_machine_check()
{
/* offline cpus may show up for the party - but don't need to do anything here - send them back home */
if (!(cpu_online(smp_processor_id())) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}

If we are crashing because of a machine check - I wonder how useful it is to run kdump(). There are a very
small set of ways that you can induce a machine check from program action - normally the problem is that
something bad happened in the h/w ... a kdump will just fill your disk and waste your time looking at what
the s/w was dong when the machine check happened.

-Tony
N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå