Re: [PATCH] x86/mce: Check that memory address is usable for recovery

From: Yazen Ghannam
Date: Tue Apr 18 2023 - 13:51:47 EST


On 4/18/23 13:27, Tony Luck wrote:
> On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote:
>> On 3/21/23 20:51, Tony Luck wrote:
>>> uc_decode_notifier() includes a check that "struct mce"
>>> contains a valid address for recovery. But the machine
>>> check recovery code does not include a similar check.
>>>
>>> Use mce_usable_address() to check that there is a valid
>>> address.
>>>
>>> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>>> ---
>>> arch/x86/kernel/cpu/mce/core.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>>> index 2eec60f50057..fa28b3f7d945 100644
>>> --- a/arch/x86/kernel/cpu/mce/core.c
>>> +++ b/arch/x86/kernel/cpu/mce/core.c
>>> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>> /* If this triggers there is no way to recover. Die hard. */
>>> BUG_ON(!on_thread_stack() || !user_mode(regs));
>>>
>>> - if (kill_current_task)
>>> + if (kill_current_task || !mce_usable_address(&m))
>>> queue_task_work(&m, msg, kill_me_now);
>>> else
>>> queue_task_work(&m, msg, kill_me_maybe);
>>
>> I think it should be like this:
>>
>> if (mce_usable_address(&m))
>> queue_task_work(&m, msg, kill_me_maybe);
>> else
>> queue_task_work(&m, msg, kill_me_now);
>>
>> A usable address should always go through memory_failure() so that the page is
>> marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
>> flag and try to kill all processes after the page is poisoned.
>>
>> I had a similar patch a while back:
>> https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@xxxxxxx/
>>
>> We could also get rid of kill_me_now() like you had suggested.
>
> Can we also get rid of "kill_current_task"? It is only set when there is
> no valid return address:
>
> if (!(m.mcgstatus & MCG_STATUS_RIPV))
> kill_current_task = 1;
>
> kill_me_maybe() does an equivalent check:
>
> if (!p->mce_ripv)
> flags |= MF_MUST_KILL;
>
> Which only leaves this check to resolve in some way:
>
> if (worst != MCE_AR_SEVERITY && !kill_current_task)
> goto out;
>

I agree. And I think all these checks should be baked into the severity.
We'll need additional, fine-grained severity levels though.

The "m.cs" and "m.kflags" checks could also be baked in.

Instead of just one AR severity:
...
MCE_AR_SEVERITY,
MCE_PANIC_SEVERITY,

replace it with specific cases:
...
MCE_AR_USER_RECOV,
MCE_AR_USER_KILL,
MCE_AR_KERNEL_COPYIN,
MCE_AR_KERNEL_RECOV,
MCE_PANIC_SEVERITY,

Then the #MC handler can look like this:

if (worst < MCE_AR_USER_RECOV)
goto out;

if (severity == MCE_AR_USER_RECOV)
queue_task_work(&m, msg, kill_me_maybe);

if (severity == MCE_AR_USER_KILL)
force_sig(SIGBUS);

if (severity == MCE_AR_KERNEL_COPYIN)
queue_task_work(&m, msg, kill_me_never);

if (severity == MCE_AR_KERNEL_RECOV) {
if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
mce_panic("Failed kernel mode recovery");
}

I can take a shot at this if it seems reasonable.

What do you think?

Thanks,
Yazen