Re: [PATCH] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks

From: Borislav Petkov
Date: Wed Feb 16 2022 - 05:29:28 EST


On Tue, Feb 15, 2022 at 02:22:33PM -0800, Luck, Tony wrote:
> This early in do_machine check we don't know whether this was from
> a over enthusistic REP;MOVS fetch, or a "normal" machine check.
> I don't think there is an easy way to tell the difference.

That's what I am wondering: whether we can compare the buffers REP;
MOVS was accessing and determine whether the access was out of bounds.
Something ala _ASM_EXTABLE_ as it is done in arch/x86/lib/copy_mc_64.S,
for example, which will land us in fixup_exception().

Now there we'd need to know the range the thing was copying which should
be in pt_regs and the address the MCE reported. If latter is not in the
former range, we say ignore.

There's even some blurb about "recovering from fast-string exceptions"
over copy_mc_enhanced_fast_string...

Hmmm?

> The first check:
>
> if ((mcgstatus & MCG_STATUS_LMCES)
>
> is for "is this a local machine check"? So no broadcast sync
> needed. But that needs a comment.

Yap.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette