Re: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user_range()

From: HORIGUCHI NAOYA(堀口 直也)
Date: Wed Apr 26 2023 - 22:31:27 EST


On Thu, Apr 27, 2023 at 09:06:46AM +0800, Kefeng Wang wrote:
>
>
> On 2023/4/26 23:45, Luck, Tony wrote:
> > > > > Thanks for your confirm, and what your option about add
> > > > > MCE_IN_KERNEL_COPYIN to EX_TYPE_DEFAULT_MCE_SAFE/FAULT_MCE_SAFE type
> > > > > to let do_machine_check call queue_task_work(&m, msg, kill_me_never),
> > > > > which kill every call memory_failure_queue() after mc safe copy return?
> > > >
> > > > I haven't been following this thread closely. Can you give a link to the e-mail
> > > > where you posted a patch that does this? Or just repost that patch if easier.
> > >
> > > The major diff changes is [1], I will post a formal patch when -rc1 out,
> > > thanks.
> > >
> > > [1]
> > > https://lore.kernel.org/linux-mm/6dc1b117-020e-be9e-7e5e-a349ffb7d00a@xxxxxxxxxx/
> >
> > There seem to be a few misconceptions in that message. Not sure if all of them
> > were resolved. Here are some pertinent points:
> >
> > > > > In my understanding, an MCE should not be triggered when MC-safe copy
> > > > > tries
> > > > > to access to a memory error. So I feel that we might be talking about
> > > > > different scenarios.
> >
> > This is wrong. There is still a machine check when a MC-safe copy does a read
> > from a location that has a memory error.

Yes, the above was my first impression to be proven wrong ;)

> >
> > The recovery flow in this case does not involve queue_task_work(). That is only
> > useful for machine check exceptions taken in user context. The queued work will
> > be executed to call memory_failure() from the kernel, but in process context (not
> > from the machine check exception stack) to handle the error.
> >
> > For machine checks taken by kernel code (MC-safe copy functions) the recovery
> > path is here:
> >
> > if (m.kflags & MCE_IN_KERNEL_RECOV) {
> > if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
> > mce_panic("Failed kernel mode recovery", &m, msg);
> > }
> >
> > if (m.kflags & MCE_IN_KERNEL_COPYIN)
> > queue_task_work(&m, msg, kill_me_never);
> >
> > The "fixup_exception()" ensures that on return from the machine check handler
> > code returns to the extable[] fixup location instead of the instruction that was
> > loading from the memory error location.
> >
> > When the exception was from one of the copy_from_user() variants it makes
> > sense to also do the queue_task_work() because the kernel is going to return
> > to the user context (with an EFAULT error code from whatever system call was
> > attempting the copy_from_user()).
> >
> > But in the core dump case there is no return to user. The process is being
> > terminated by the signal that leads to this core dump. So even though you
> > may consider the page being accessed to be a "user" page, you can't fix
> > it by queueing work to run on return to user.
>
> For coredump,the task work will be called too, see following code,
>
> get_signal
> sig_kernel_coredump
> elf_core_dump
> dump_user_range
> _copy_from_iter // with MC-safe copy, return without panic
> do_group_exit(ksig->info.si_signo);
> do_exit
> exit_task_work
> task_work_run
> kill_me_never
> memory_failure
>
> I also add debug print to check the memory_failure() processing after
> add MCE_IN_KERNEL_COPYIN to MCE_SAFE exception type, also tested CoW of
> normal page and huge page, it works too.

Sounds nice to me.
Maybe this information is worth documenting in the patch description.

Thanks,
Naoya Horiguchi