Re: [PATCH v2] x86/mce: Always call memory_failure() when there is a valid address

From: Yazen Ghannam
Date: Tue Apr 18 2023 - 14:17:41 EST


On 4/18/23 14:03, Tony Luck wrote:
> Linux should always take poisoned pages offline when there is an error
> report with a valid physcal address.
>
> Note1: that call_me_maybe() will correctly handle the case currently
> covered by the test of "kill_current_task" that is deleted by this
> change because it will set the MF_MUST_KILL flag when p->mce_ripv is
> not set.
>
> Note2: This also provides defense against the case where the logged
> error doesn't provide a physical address.
>
> Suggested-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
> ---
> arch/x86/kernel/cpu/mce/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2eec60f50057..f72c97860524 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
> /* If this triggers there is no way to recover. Die hard. */
> BUG_ON(!on_thread_stack() || !user_mode(regs));
>
> - if (kill_current_task)
> + if (mce_usable_address(&m))

This should be !mce_usable_address().

> queue_task_work(&m, msg, kill_me_now);
> else
> queue_task_work(&m, msg, kill_me_maybe);

Thanks,
Yazen

P.S. I had the exact change in mind. :)

Copying old patch here. Feel free to reuse any of the commit message if
it helps.