Re: [PATCH v2 3/4] ftrace: add max execution time mesurement toworkqueue tracer

From: Frederic Weisbecker
Date: Mon Apr 13 2009 - 12:17:11 EST


On Mon, Apr 13, 2009 at 02:53:45PM +0900, KOSAKI Motohiro wrote:
> Subject: [PATCH] ftrace: add max execution time mesurement to workqueue tracer
>
> Too slow workqueue handler often introduce some trouble to system.
> Then, administrator want to know may handler execution time.
>
>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> Cc: Zhaolei <zhaolei@xxxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Tom Zanussi <tzanussi@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxx>
> ---
> kernel/trace/trace_workqueue.c | 48 +++++++++++++++++++++++++++++++++++++----
> 1 file changed, 44 insertions(+), 4 deletions(-)
>
> Index: b/kernel/trace/trace_workqueue.c
> ===================================================================
> --- a/kernel/trace/trace_workqueue.c 2009-04-13 13:23:59.000000000 +0900
> +++ b/kernel/trace/trace_workqueue.c 2009-04-13 13:59:51.000000000 +0900
> @@ -27,6 +27,10 @@ struct cpu_workqueue_stats {
> * on a single CPU.
> */
> unsigned int executed;
> +
> + /* store handler execution time */
> + u64 handler_start_time;
> + u64 max_executed_time;
> };



Imho, this is the wrong place to instrument.
I explain my point of view below.



> /* List of workqueue threads on one cpu */
> @@ -77,6 +81,7 @@ probe_workqueue_entry(struct task_struct
> list) {
> if (node->pid == wq_thread->pid) {
> node->executed++;
> + node->handler_start_time = trace_clock_global();
> goto found;
> }
> }
> @@ -85,6 +90,29 @@ found:
> spin_unlock_irqrestore(&workqueue_cpu_stat(cpu)->lock, flags);
> }
>
> +static void probe_workqueue_exit(struct task_struct *wq_thread,
> + struct work_struct *work)
> +{
> + int cpu = cpumask_first(&wq_thread->cpus_allowed);
> + struct cpu_workqueue_stats *node, *next;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
> + list_for_each_entry_safe(node, next, &workqueue_cpu_stat(cpu)->list,
> + list) {


Do you need the safe version here? You don't seem to remove any entry.

Sidenote: only the workqueue destruction handler might need it if I'm
not wrong.
I placed some of them in other places in this file because I misunderstood the
kernel list concepts in the past :)
(Heh, and probably still today).



> + if (node->pid == wq_thread->pid) {
> + u64 start = node->handler_start_time;
> + u64 executed_time = trace_clock_global() - start;
> +
> + if (node->max_executed_time < executed_time)
> + node->max_executed_time = executed_time;
> + goto found;
> + }
> + }
> +found:
> + spin_unlock_irqrestore(&workqueue_cpu_stat(cpu)->lock, flags);
> +}
> +
> /* Creation of a cpu workqueue thread */
> static void probe_workqueue_creation(struct task_struct *wq_thread, int cpu)
> {
> @@ -195,6 +223,9 @@ static int workqueue_stat_show(struct se
> int cpu = cws->cpu;
> struct pid *pid;
> struct task_struct *tsk;
> + unsigned long long exec_time = ns2usecs(cws->max_executed_time);
> + unsigned long exec_usec_rem = do_div(exec_time, USEC_PER_SEC);
> + unsigned long exec_secs = (unsigned long)exec_time;
>
> spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
> if (&cws->list == workqueue_cpu_stat(cpu)->list.next)
> @@ -205,8 +236,11 @@ static int workqueue_stat_show(struct se
> if (pid) {
> tsk = get_pid_task(pid, PIDTYPE_PID);
> if (tsk) {
> - seq_printf(s, "%3d %6d %6u %s\n", cws->cpu,
> + seq_printf(s, "%3d %6d %6u %5lu.%06lu"
> + " %s\n",
> + cws->cpu,
> atomic_read(&cws->inserted), cws->executed,
> + exec_secs, exec_usec_rem,




You are measuring the latency from a workqueue thread point of view.
While I find the work latency measurement very interesting,
I think this patch does it in the wrong place. The _work_ latency point of view
seems to me much more rich as an information source.

There are several reasons for that.

Indeed this patch is useful for workqueues that receive always the same work
to perform so that you can find very easily the guilty worklet.
But the sense of this design is lost once we consider the workqueue threads
that receive random works. Of course the best example is events/%d
One will observe the max latency that happened on event/0 as an exemple but
he will only be able to feel a silent FUD because he has no way to find
which work caused this max latency.


Especially the events/%d latency measurement seems to me very important
because a single work from a random driver can propagate its latency
all over the system.

A single work that consumes too much cpu time, waits for long coming
events, sleeps too much, tries to take too often contended locks, or
whatever... such single work may delay all pending works in the queue
and the only max latency for a given workqueue is not helpful to find
these culprits.

Having this max latency snapshot per work and not per workqueue thread
would be useful for every kind of workqueue latency instrumentation:

- workqueues with single works
- workqueue with random works

A developer will also be able to measure its own worklet action and
find if it takes too much time, even if it isn't the worst worklet in
the workqueue to cause latencies.

The end result would be to have a descending latency sort of works
per cpu workqueue threads (or better: per workqueue group).

What do you think?

Frederic.



> tsk->comm);
> put_task_struct(tsk);
> }
> @@ -218,8 +252,8 @@ static int workqueue_stat_show(struct se
>
> static int workqueue_stat_headers(struct seq_file *s)
> {
> - seq_printf(s, "# CPU INSERTED EXECUTED NAME\n");
> - seq_printf(s, "# | | | |\n");
> + seq_printf(s, "# CPU INSERTED EXECUTED MAX_TIME NAME\n");
> + seq_printf(s, "# | | | | |\n");
> return 0;
> }
>
> @@ -259,10 +293,14 @@ int __init trace_workqueue_early_init(vo
> if (ret)
> goto no_insertion;
>
> - ret = register_trace_workqueue_creation(probe_workqueue_creation);
> + ret = register_trace_workqueue_handler_exit(probe_workqueue_exit);
> if (ret)
> goto no_handler_entry;
>
> + ret = register_trace_workqueue_creation(probe_workqueue_creation);
> + if (ret)
> + goto no_handler_exit;
> +
> ret = register_trace_workqueue_destruction(probe_workqueue_destruction);
> if (ret)
> goto no_creation;
> @@ -276,6 +314,8 @@ int __init trace_workqueue_early_init(vo
>
> no_creation:
> unregister_trace_workqueue_creation(probe_workqueue_creation);
> +no_handler_exit:
> + unregister_trace_workqueue_handler_exit(probe_workqueue_exit);
> no_handler_entry:
> unregister_trace_workqueue_handler_entry(probe_workqueue_entry);
> no_insertion:
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/