Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery

From: Borislav Petkov
Date: Thu Jan 14 2021 - 15:23:23 EST


On Mon, Jan 11, 2021 at 01:44:50PM -0800, Tony Luck wrote:
> @@ -1431,8 +1433,11 @@ noinstr void do_machine_check(struct pt_regs *regs)
> mce_panic("Failed kernel mode recovery", &m, msg);
> }
>
> - if (m.kflags & MCE_IN_KERNEL_COPYIN)
> + if (m.kflags & MCE_IN_KERNEL_COPYIN) {
> + if (current->mce_busy)
> + mce_panic("Multiple copyin", &m, msg);

So this: we're currently busy handling the first MCE, why do we must
panic?

Can we simply ignore all follow-up MCEs to that page?

I.e., the page will get poisoned eventually and that poisoning is
currently executing so all following MCEs are simply nothing new and we
can ignore them.

It's not like we're going to corrupt more data - we already are
"corrupting" whole 4K.

Am I making sense?

Because if we do this, we won't have to pay attention to any get_user()
callers and whatnot - we simply ignore and the solution is simple and
you won't have to touch any get_user() callers...

Hmmm?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette