Re: [PATCH] x86/mce: Check that memory address is usable for recovery

From: Yazen Ghannam
Date: Tue Apr 18 2023 - 12:41:51 EST


On 3/21/23 20:51, Tony Luck wrote:
> uc_decode_notifier() includes a check that "struct mce"
> contains a valid address for recovery. But the machine
> check recovery code does not include a similar check.
>
> Use mce_usable_address() to check that there is a valid
> address.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
> arch/x86/kernel/cpu/mce/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2eec60f50057..fa28b3f7d945 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
> /* If this triggers there is no way to recover. Die hard. */
> BUG_ON(!on_thread_stack() || !user_mode(regs));
>
> - if (kill_current_task)
> + if (kill_current_task || !mce_usable_address(&m))
> queue_task_work(&m, msg, kill_me_now);
> else
> queue_task_work(&m, msg, kill_me_maybe);

I think it should be like this:

if (mce_usable_address(&m))
queue_task_work(&m, msg, kill_me_maybe);
else
queue_task_work(&m, msg, kill_me_now);

A usable address should always go through memory_failure() so that the page is
marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
flag and try to kill all processes after the page is poisoned.

I had a similar patch a while back:
https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@xxxxxxx/

We could also get rid of kill_me_now() like you had suggested.

Thanks,
Yazen