Re: [PATCH] coredump debugging: add a tracepoint to report the coredumping

From: Wen Yang
Date: Wed Feb 21 2024 - 10:51:20 EST




On 2024/2/20 01:00, Oleg Nesterov wrote:
On 02/19, Steven Rostedt wrote:

On Sat, 17 Feb 2024 11:49:24 +0100
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

On 02/17, wenyang.linux@xxxxxxxxxxx wrote:

From: Wen Yang <wenyang.linux@xxxxxxxxxxx>

Currently coredump_task_exit() takes some time to wait for the generation
of the dump file. But if the user-space wants to receive a notification
as soon as possible it maybe inconvenient.

Add the new trace_sched_process_coredump() into coredump_task_exit(),
this way a user-space monitor could easily wait for the exits and
potentially make some preparations in advance.

Can't comment, I never know when the new tracepoint will make sense.

Stupid question. Can we simply shift trace_sched_process_exit() up
before coredump_task_exit() ?

Reading the rest of the thread and looking at the code, we do have this:

void __noreturn do_exit(long code)
{
struct task_struct *tsk = current;
int group_dead;

[...]
acct_collect(code, group_dead);
if (group_dead)
tty_audit_exit();
audit_free(tsk);

tsk->exit_code = code;
taskstats_exit(tsk, group_dead);

exit_mm();

if (group_dead)
acct_process();
trace_sched_process_exit(tsk);

There's a lot that happens before we trigger the above event.

and a lot after.

To me the current placement of looks absolutely
random.

I could
imagine that there are users expecting those actions to have taken place by
the time the event triggered. Like the "exit_mm()" call, as well as many
others.

I would be leery of moving that tracepoint.

And I agree. I am always scared of every user-visible change, simply
because it is user-visbible.

If it was not clear, I didn't try to nack this patch. I simply do not know
how people use the tracepoints and for what. Apart from debugging.

But if we add the new one into coredump_task_exit(), then we probably want
another one in ptrace_event(PTRACE_EVENT_EXIT) ? It too can "take some time"
before the exiting task actually exits.

So I think this needs some discussion, and the changelog should probably say
more.

In short: I am glad you are here, I leave this to you and Wen ;)


Thank you Oleg, thank you Steven,

We could first put trace_sched_process_exit() aside and take a look at these three patches:

2d4bcf886e42f0f4846a3d9bdc3a90d278903a2e ("exit: Remove profile_task_exit & profile_munmap")

586b58cac8b4683eb58a1446fbc399de18974e40 (“exit: Move preemption fixup up, move blocking operations down”)

And the original: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2


Could we add a new TP similar to profile_task_exit()?

--
Best wishes,
Wen