Re: [PATCH 0/5] [RFC] binary reading of ftrace ring buffers

From: Ingo Molnar
Date: Sun Mar 08 2009 - 15:22:21 EST



* Jiaying Zhang <jiayingz@xxxxxxxxxx> wrote:

> I would like to point out that we think it is really important
> to have some very efficient probing mechanism in the kernel
> for tracing in production environments. The printf and va_arg
> based probes are flexible but less efficient when we want to
> trace high-throughput events. Even function calls can add
> noticeable overhead according to our measurements. So I think
> we need to provide a way (mostly via macro definitions) with
> which a subsystem can enter an event into a trace buffer
> through a short code path. I.e., we should limit the number of
> callbacks and avoid format string parsing.
>
> As I understand, Steven's latest TRACE_FIELD patch avoids such
> overhead, although it does seem to add complexity for adding
> new trace points. [...]

Yeah - it was motivated by the patches you sent to lkml which
showed that it's possible to do it quite sanely and that it can
be done faster.

> [...] It would be nice if we can replace the above
> sched_switch declaration with just a couple of macros.

Good point - there's ongoing work to simplify the TRACE_FIELD
approach. The current (not yet pushed out) optimized tracepoint
format Steve is working on is:

/*
* Tracepoint for task switches, performed by the scheduler:
*
* (NOTE: the 'rq' argument is not used by generic trace events,
* but used by the latency tracer plugin. )
*/
TRACE_EVENT(sched_switch,

TP_PROTO(struct rq *rq, struct task_struct *prev,
struct task_struct *next),

TP_ARGS(rq, prev, next),

TP_STRUCT__entry(
__array( char, prev_comm, TASK_COMM_LEN )
__field( pid_t, prev_pid )
__field( int, prev_prio )
__array( char, next_comm, TASK_COMM_LEN )
__field( pid_t, next_pid )
__field( int, next_prio )
),

TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
__entry->next_comm, __entry->next_pid, __entry->next_prio),

TP_fast_assign(
memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
__entry->prev_prio = prev->prio;
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
__entry->next_pid = next->pid;
__entry->next_prio = next->prio;
)
);

As you can see it enumerates fields, provides format-based
tracing and a tracepoint as well. It also looks quite similar to
C syntax while still being an information-dense macro.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/