Re: [RFC PATCH 0/3] Machine check recovery when kernel accesses poison

From: Borislav Petkov
Date: Tue Nov 10 2015 - 06:21:18 EST


On Mon, Nov 09, 2015 at 10:26:08AM -0800, Tony Luck wrote:
> This is a first draft to show the direction I'm taking to
> make it possible for the kernel to recover from machine
> checks taken while kernel code is executing.

Just a general, why-do-we-do-this, question: on big systems, the memory
occupied by the kernel is a very small percentage compared to whole RAM,
right? And yet we want to recover from there too? Not, say, kexec...

> Note that I also fudge the return value. I'd like in the future
> to be able to write a "mcsafe_copy_from_user()" function that
> would be annotated both for page faults, to return a count of
> bytes uncopied, or an indication that there was a machine check.
> Hence the BIT(63) bit. Internal feedback suggested we'd need
> some IS_ERR() like macros to help users decode what happened
> to take the right action. But this is "RFC" to see if people
> have better ideas on how to handle this.

Hmm, shouldn't this be using MF_ACTION_REQUIRED or even maybe a new MF_
flag which is converted into a BUS_MCEERR_AR si_code and thus current
gets a signal?

Only setting bit 63 looks a bit flaky to me...

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/