Re: [PATCH v3] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic

From: Borislav Petkov
Date: Tue Oct 17 2023 - 07:18:40 EST


On Tue, Oct 17, 2023 at 01:24:53AM +0000, Luck, Tony wrote:
> How about:
>
> When there is a fatal machine check Linux calls mce_panic()
> without checking to see if bad data at some memory address
> was reported in the machine check banks.

... for the simple reason that the kernel cannot allow itself to do any
unnecessary work but panic immediately so that it can stop the
propagation of bad data.

Now, it's a whole different story whether that's the right thing to do
and whether the data has already propagated so that the panic is moot.

The whole point I'm trying to make is that the machine panics because
the error severity dictates it to do so. And there's no opportunity to
queue recovery work because it simply cannot in that case. So the commit
message should simply state that we're marking the page as poison for
the kexec'ed kernel's sake and not because of anything else.

> If kexec is enabled, check for memory errors and mark the
> page as poisoned so that the kexec'd kernel can avoid accessing
> the page.

Yap, yours makes sense.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette