Re: [PATCH 1/2] exit: add a tracepoint for profiling a task that is starting to exit

From: Wen Yang
Date: Fri Feb 23 2024 - 00:18:06 EST




On 2024/2/23 00:25, Mathieu Desnoyers wrote:
On 2024-02-22 11:04, wenyang.linux@xxxxxxxxxxx wrote:
From: Wen Yang <wenyang.linux@xxxxxxxxxxx>

Currently coredump_task_exit() takes some time to wait for the generation
of the dump file. But if the user-space wants to receive a notification
as soon as possible it maybe inconvenient.

Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & profile_munmap")
simplified the code, but also removed profile_task_exit(), which may
prevent third-party kernel modules from detecting process exits timely.

Add the new trace_sched_profile_task_exit() this way a user-space monitor
could detect the exits and potentially make some preparations in advance.

I don't see any explanation justifying adding an extra tracepoint
rather than just moving trace_sched_process_exit() earlier in do_exit().

Why is moving trace_sched_process_exit() earlier in do_exit() an issue,
considering that any tracer interested in knowing the point where a task
is really reclaimed (from zombie state) is trace_sched_process_free()
called from delayed_put_task_struct() ?

Thanks,

Mathieu


Thanks.
We will make the modifications according to your suggestions.

--
Best wishes,
Wen


Signed-off-by: Wen Yang <wenyang.linux@xxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
  include/trace/events/sched.h | 28 ++++++++++++++++++++++++++++
  kernel/exit.c                |  1 +
  2 files changed, 29 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index dbb01b4b7451..750b2f0bdf69 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -341,6 +341,34 @@ DEFINE_EVENT(sched_process_template, sched_wait_task,
      TP_PROTO(struct task_struct *p),
      TP_ARGS(p));
+/*
+ * Tracepoint for profiling a task that is starting to exit:
+ */
+TRACE_EVENT(sched_profile_task_exit,
+
+    TP_PROTO(struct task_struct *task, long code),
+
+    TP_ARGS(task, code),
+
+    TP_STRUCT__entry(
+        __array(    char,    comm,    TASK_COMM_LEN    )
+        __field(    pid_t,    pid            )
+        __field(    int,    prio            )
+        __field(    long,    code            )
+    ),
+
+    TP_fast_assign(
+        memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+        __entry->pid        = task->pid;
+        __entry->prio        = task->prio;
+        __entry->code        = code;
+    ),
+
+    TP_printk("comm=%s pid=%d prio=%d exit_code=0x%lx",
+          __entry->comm, __entry->pid, __entry->prio,
+          __entry->code)
+);
+
  /*
   * Tracepoint for a waiting task:
   */
diff --git a/kernel/exit.c b/kernel/exit.c
index 493647fd7c07..f675f879a1b2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -826,6 +826,7 @@ void __noreturn do_exit(long code)
      WARN_ON(tsk->plug);
+    trace_sched_profile_task_exit(tsk, code);
      kcov_task_exit(tsk);
      kmsan_task_exit(tsk);