Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

From: Naoya Horiguchi
Date: Mon Mar 02 2015 - 09:33:50 EST


On Mon, Mar 02, 2015 at 01:17:01PM +0100, Borislav Petkov wrote:
On Mon, Mar 02, 2015 at 02:31:19AM +0000, Naoya Horiguchi wrote:
> And please note that the target of this patch is an MCE when the kernel is
> already running on kdump code (so crashing happened *not* because of the MCE).
> In that case, we can expect that kdump works fine if the MCE hits the "kdump
> shotdown" CPU which are just running cpu_relax() loop, because a 2nd kernel's
> CPU isn't affected by the MCE (even the CPU failure is fatal one.)

Well, why would you even want to disable MCA then? If all the CPUs are
offlined, it is very very highly unlikely they'd cause an MCE.

Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why
current kexec implementation doesn't offline the other CPUs but just doing
cpu_relax() loop, but my guess is that in some kernel panic situation (like
soft lockup) we want to keep CPUs' status undisturbed to make sure the bug's
info is captured in kdump.

> If a fatal MCE happens on the CPU running kdump code, there's no reason to
> try harder to get kdump as you pointed out. In such case, what we can do is
> to print out a message like "kdump failed due to MCE" and reset the system.

Yes, so a primitive kdump-specific MCE handler would be more viable than
disabling MCA.

OK.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/