Re: [PATCH v2] mm: hwpoison: coredump: support recovery from dump_user_range()

From: HORIGUCHI NAOYA(堀口 直也)
Date: Wed Apr 19 2023 - 03:26:37 EST


On Tue, Apr 18, 2023 at 05:45:06PM +0800, Kefeng Wang wrote:
>
>
> On 2023/4/18 11:13, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Mon, Apr 17, 2023 at 12:53:23PM +0800, Kefeng Wang wrote:
> > > The dump_user_range() is used to copy the user page to a coredump file,
> > > but if a hardware memory error occurred during copy, which called from
> > > __kernel_write_iter() in dump_user_range(), it crashes,
> > >
> > > CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425
> > > pc : __memcpy+0x110/0x260
> > > lr : _copy_from_iter+0x3bc/0x4c8
> > > ...
> > > Call trace:
> > > __memcpy+0x110/0x260
> > > copy_page_from_iter+0xcc/0x130
> > > pipe_write+0x164/0x6d8
> > > __kernel_write_iter+0x9c/0x210
> > > dump_user_range+0xc8/0x1d8
> > > elf_core_dump+0x308/0x368
> > > do_coredump+0x2e8/0xa40
> > > get_signal+0x59c/0x788
> > > do_signal+0x118/0x1f8
> > > do_notify_resume+0xf0/0x280
> > > el0_da+0x130/0x138
> > > el0t_64_sync_handler+0x68/0xc0
> > > el0t_64_sync+0x188/0x190
> > >
> > > Generally, the '->write_iter' of file ops will use copy_page_from_iter()
> > > and copy_page_from_iter_atomic(), change memcpy() to copy_mc_to_kernel()
> > > in both of them to handle #MC during source read, which stop coredump
> > > processing and kill the task instead of kernel panic, but the source
> > > address may not always a user address, so introduce a new copy_mc flag in
> > > struct iov_iter{} to indicate that the iter could do a safe memory copy,
> > > also introduce the helpers to set/cleck the flag, for now, it's only
> > > used in coredump's dump_user_range(), but it could expand to any other
> > > scenarios to fix the similar issue.
> > >
> > > Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> > > Cc: Christian Brauner <brauner@xxxxxxxxxx>
> > > Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> > > Cc: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> > > Cc: Tong Tiangen <tongtiangen@xxxxxxxxxx>
> > > Cc: Jens Axboe <axboe@xxxxxxxxx>
> > > Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
> > > ---
> > > v2:
> > > - move the helper functions under pre-existing CONFIG_ARCH_HAS_COPY_MC
> > > - reposition the copy_mc in struct iov_iter for easy merge, suggested
> > > by Andrew Morton
> > > - drop unnecessary clear flag helper
> > > - fix checkpatch warning
> > > fs/coredump.c | 1 +
> > > include/linux/uio.h | 16 ++++++++++++++++
> > > lib/iov_iter.c | 17 +++++++++++++++--
> > > 3 files changed, 32 insertions(+), 2 deletions(-)
> > >
> > ...
> > > @@ -371,6 +372,14 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> > > EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
> > > #endif /* CONFIG_ARCH_HAS_COPY_MC */
> > > +static void *memcpy_from_iter(struct iov_iter *i, void *to, const void *from,
> > > + size_t size)
> > > +{
> > > + if (iov_iter_is_copy_mc(i))
> > > + return (void *)copy_mc_to_kernel(to, from, size);
> >
> > Is it helpful to call memory_failure_queue() if copy_mc_to_kernel() fails
> > due to a memory error?
>
> For dump_user_range(), the task is dying, if copy incomplete size, the
> coredump will fail and task will exit, also memory_failure will
> be called by kill_me_maybe(),
>
> CPU: 0 PID: 1418 Comm: test Tainted: G M 6.3.0-rc5 #29
> Call Trace:
> <TASK>
> dump_stack_lvl+0x37/0x50
> memory_failure+0x51/0x970
> kill_me_maybe+0x5b/0xc0
> task_work_run+0x5a/0x90
> exit_to_user_mode_prepare+0x194/0x1a0
> irqentry_exit_to_user_mode+0x9/0x30
> noist_exc_machine_check+0x40/0x80
> asm_exc_machine_check+0x33/0x40

Is this call trace printed out when copy_mc_to_kernel() failed by finding
a memory error (or in some testcase using error injection)?
In my understanding, an MCE should not be triggered when MC-safe copy tries
to access to a memory error. So I feel that we might be talking about
different scenarios.

When I questioned previously, I thought about the following scenario:

- a process terminates abnormally for any reason like segmentation fault,
- then, kernel tries to create a coredump,
- during this, the copying routine accesses to corrupted page to read.

In this case the corrupted page should not be handled by memory_failure()
yet (because otherwise properly handled hwpoisoned page should be ignored
by coredump process). The coredump process would exit with failure with
your patch, but then, the corrupted page is still left unhandled and can
be reused, so any other thread can easily access to it again.

You can find a few other places (like __wp_page_copy_user and ksm_might_need_to_copy)
to call memory_failure_queue() to cope with such unhandled error pages.
So does memcpy_from_iter() do the same?

Thanks,
Naoya Horiguchi