Re: [PATCH] coredump: Limit what can interrupt coredumps

From: Olivier Langlois
Date: Wed Aug 11 2021 - 16:48:25 EST


On Tue, 2021-08-10 at 17:48 -0400, Tony Battersby wrote:
> >
> I just ran into this problem also - coredumps from an io_uring
> program
> to a pipe are truncated.  But I am using kernel 5.10.57, which does
> NOT
> have commit 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> or
> commit 06af8679449d ("coredump: Limit what can interrupt
> coredumps"). 
> Kernel 5.4 works though, so I bisected the problem to commit
> f38c7e3abfba ("io_uring: ensure async buffered read-retry is setup
> properly") in kernel 5.9.  Note that my io_uring program uses only
> async
> buffered reads, which may be why this particular commit makes a
> difference to my program.
>
> My io_uring program is a multi-purpose long-running program with many
> threads.  Most threads don't use io_uring but a few of them do. 
> Normally, my core dumps are piped to a program so that they can be
> compressed before being written to disk, but I can also test writing
> the
> core dumps directly to disk.  This is what I have found:
>
> *) Unpatched 5.10.57: if a thread that doesn't use io_uring triggers
> a
> coredump, the core file is written correctly, whether it is written
> to
> disk or piped to a program, even if another thread is using io_uring
> at
> the same time.
>
> *) Unpatched 5.10.57: if a thread that uses io_uring triggers a
> coredump, the core file is truncated, whether written directly to
> disk
> or piped to a program.
>
> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring
> triggers a coredump, and the core is written directly to disk, then
> it
> is written correctly.
>
> *) 5.10.57+backport 06af8679449d: if a thread that uses io_uring
> triggers a coredump, and the core is piped to a program, then it is
> truncated.
>
> *) 5.10.57+revert f38c7e3abfba: core dumps are written correctly,
> whether written directly to disk or piped to a program.
>
> Tony Battersby
> Cybernetics
>
Tony,

this is super interesting details. I'm leaving for few days so I will
not be able to look into it until I am back but here is my
interpretation of your findings:

f38c7e3abfba makes it more likely that your task ends up in a fd read
wait queue. Previously the io_uring req queuing was failing and
returning EAGAIN. Now it ends up using io uring fast poll.

When the core dump gets written through a pipe, pipe_write must block
waiting for some event. If the task gets waken up by the io_uring wait
queue entry instead, it must somehow make pipe_write fails.

So the problem must be a mix of TIF_NOTIFY_SIGNAL and the fact that
io_uring wait queue entries aren't cleaned up while doing the core
dump.

I have a new modif to try out. I'll hopefully be able to submit a patch
to fix that once I come back (I cannot do it now or else, I'll never
leave ;-))