Re: [PATCH] coredump debugging: add a tracepoint to report the coredumping

From: Wen Yang
Date: Wed Feb 21 2024 - 11:14:24 EST




On 2024/2/20 01:28, Steven Rostedt wrote:
On Mon, 19 Feb 2024 18:00:38 +0100
Oleg Nesterov <oleg@xxxxxxxxxx> wrote:

void __noreturn do_exit(long code)
{
struct task_struct *tsk = current;
int group_dead;

[...]
acct_collect(code, group_dead);
if (group_dead)
tty_audit_exit();
audit_free(tsk);

tsk->exit_code = code;
taskstats_exit(tsk, group_dead);

exit_mm();

if (group_dead)
acct_process();
trace_sched_process_exit(tsk);

There's a lot that happens before we trigger the above event.

and a lot after.

True. There really isn't a meaningful location here is there?

I actually use this tracepoint in my pid tracing.

The set_ftrace_pid and set_event_pid from /sys/kernel/tracing will add and
remove PIDs if the options function-fork or event-fork are set respectively.

I hook to the sched_process_fork tracepoint to add new PIDs if the parent
pid is already in one of the files, and remove a PID via the
sched_process_exit function.

Honestly, if anything, it should probably be moved down right next to
() (I never understood why needed its own hooks
and not just use tracepoints).


Perhaps it's just because perf appeared earlier, and it doesn't rely on TRACEPOINTS.
It is indeed reasonable to replace perf_event_exit_task() with TRACEPOINT, and we are willing to try to modify it. It will require some work and time.

--
Best wishes,
Wen


To me the current placement of trace_sched_process_exit() looks absolutely
random.

Agreed.


I could
imagine that there are users expecting those actions to have taken place by
the time the event triggered. Like the "exit_mm()" call, as well as many
others.

I would be leery of moving that tracepoint.

And I agree. I am always scared of every user-visible change, simply
because it is user-visbible.

If it was not clear, I didn't try to nack this patch. I simply do not know
how people use the tracepoints and for what. Apart from debugging.

But if we add the new one into coredump_task_exit(), then we probably want
another one in ptrace_event(PTRACE_EVENT_EXIT) ? It too can "take some time"
before the exiting task actually exits.

So I think this needs some discussion, and the changelog should probably say
more.

In short: I am glad you are here, I leave this to you and Wen ;)

I still would like to have your input too ;-)

-- Steve