Re: Kernel marker has no performance impact on ia64.

From: Mathieu Desnoyers
Date: Thu Jun 12 2008 - 09:53:32 EST

Next message: Pierre Ossman: "Re: [PATCH 2/8] Alchemy: register mmc platform device fordb1200/pb1200 boards."
Previous message: Takashi Iwai: "Re: ALC883 recording troubles..."
Next in thread: Peter Zijlstra: "Re: Kernel marker has no performance impact on ia64."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Peter,

* Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
> On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@xxxxxxxxxxxxx) wrote:
>
> > > So are you proposing something like:
> > >
> > > static inline void
> > > trace_sched_switch(struct task_struct *prev, struct task_struct *next)
> > > {
> > > trace_mark(sched_switch, prev, next);
> > > }
> > >
> >
> > Not exactly. Something more along the lines of
> >
> > static inline void
> > trace_sched_switch(struct task_struct *prev, struct task_struct *next)
> > {
> > /* Internal tracers. */
> > ftrace_sched_switch(prev, next);
> > othertracer_sched_switch(prev, next);
> > /*
> > * System-wide tracing. Useful information is exported here.
> > * Probes connecting to these markers are expected to only use the
> > * information provided to them for data collection purpose. Type
> > * casting pointers is discouraged.
> > */
> > trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld",
> > prev->pid, next->pid, prev->state);
> > }
>
> Advantage of my method would be that ftrace (and othertracer) can use
> the same marker and doesn't need yet another hoook.
>

Am I correct by saying that the method you propose completely removes
type checking between the instrumentation site and what the probes
expect ? If yes, this seems to be too fragile. Every time a marker would
change, one would have to audit _every_ probes, both in-kernel and in
modules. Adding type checking to the marker infrastructure makes
automatic detection of these changes possible.

> > > dropping the silly fmt string but using the multiplex of trace_mark, and
> > > then doing the stringify bit:
> > >
> > > "prev_pid %d next_pid %d prev_state %ld\n"
> > >
> > > in the actual tracer?
> > >
> >
> > It would make much more sense to put this formatting information along
> > with the trace point (e.g. in a a kernel/sched-trace.h header) rather
> > that to hide it in a tracer (loadable module) because this information
> > is an interface to the trace point.
>
> I'm not sure - it seems to me it should be part of the tracer because
> its a detail/subset of the actual data - rendering it useless for others
> who'd like a different set.
>

If it ends up elsewhere, then we have to ensure type correctness in some
way.

> > > IMHO the 'type safety' of the fmt string is over-rated, since it cannot
> > > distinguish between a task_struct * or a bio *, both are a pointers -
> > > and half arsed type safely is worse than no type safety.
> > >
> >
> > I totally agree with you that not having the capacity to inspect pointer
> > types is a problem for tracers which wants to receive the "raw" pointer
> > and deal with the data they need like big boys. On the other hand, it
> > requires them to be closely tied to the kernel internals and therefore
> > it makes sense to call them directly from the tracing site, thus
> > bypassing the marker format string.
> >
> > However, letting the marker specify the data format so a tracer could
> > format it into a memory buffer (in a binary or text format, depending on
> > the implementation) or so that a tool like systemtap can use this
> > identified information without having to be closely tied to the kernel
> > makes sense to me.
>
> So s-tap is meant to parse this sting and interpret the varargs without
> being closely tied to the kernel? - Somehow that doesn't make me feel
> warm and fuzzy. That not only ties userspace to the information present
> in the marker, but to the actual string as well.
>
> The stronger you make this bind the less I like it.
>

Well, the string contains each field name and type. Therefore, SystemTAP
can hook on a marker and parse the string looking for some elements by
passing a NULL format string upon probe registration. Alternatively, it
can provide the exact format string expected when it registers its probe
to the marker and a check will be done to verify that the format string
passed along with the registered probe matches the marker format string.

Also, about what you said earlier in this thread :
"Regular trace points can be custom made; this has the advantages that
it raises the implementation barrier and hopefully that encourages some
thought in the process. It also avoid the code from growing into
something that looks like someone had a long night of debugging."

Before it has been moved to the markers, LTTng was once designed with
custom-made code to save the trace information through custom hooks. To
help maintainers instrument their own subsystem and do the right choice
without being a tracing expert, we created a code generator which
generated this custom code for each trace point given a description of
the trace points. It turned out that keeping this duplicate list of
trace points was cumbersome and that the generated code did eat a lot of
instruction cache. This is why to turned to markers, so we could re-use
a common infrastructure to serialize the data into trace buffers. We
turned to the marker format string to allow the types to serialize to be
parsed efficiently by the tracer. I strongly recommend not to declare
the types associated with a kernel trace point in two unrelated
locations without type checking in-between them (e.g. trace_mark in
kernel code, string in the tracer module), because it would then become
harder to track consistency when the code changes.

However, I would not be against an hybrid of Masami's proposal and
current markers, which I will propose in reply to his email.

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Pierre Ossman: "Re: [PATCH 2/8] Alchemy: register mmc platform device fordb1200/pb1200 boards."
Previous message: Takashi Iwai: "Re: ALC883 recording troubles..."
Next in thread: Peter Zijlstra: "Re: Kernel marker has no performance impact on ia64."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]