Re: [patch for 2.6.26 0/7] Architecture Independent Markers

From: Ingo Molnar
Date: Fri Mar 28 2008 - 09:33:28 EST



* Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxx> wrote:

> 6a5: 89 5c 24 14 mov %ebx,0x14(%esp)
> 6a9: 8b 55 d0 mov -0x30(%ebp),%edx
> 6ac: 89 54 24 10 mov %edx,0x10(%esp)
> 6b0: 89 4c 24 0c mov %ecx,0xc(%esp)
> 6b4: c7 44 24 08 f7 04 00 movl $0x4f7,0x8(%esp)
> 6bb: 00
> 6bc: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
> 6c3: 00
> 6c4: c7 04 24 00 00 00 00 movl $0x0,(%esp)
> 6cb: ff 15 0c 00 00 00 call *0xc
> 6d1: e9 c3 fc ff ff jmp 399 <schedule+0x130>
>
> Which adds an extra 50 bytes.

you talk about 32-bit while i talk about 64-bit. All these costs go up
on 64-bit and you should know that. I measured 44 bytes in the fastpath
and 52 bytes in the slowpath, which gives 96 bytes. (with a distro
.config and likely with a different gcc)

96 bytes _per marker_ sprinkled throughout the kernel. This blows up the
cache footprint of the kernel quite substantially, because it's all
fragmented - even if this is in the 'slowpath'.

so yes, that is the bloat i'm talking about.

dont just compare it to ftrace-sched-switch, compare it to dyn-ftrace
which gives us more than 78,000 trace points in the kernel _here and
today_ at no measurable runtime cost, with a 5 byte NOP per trace point
and _zero_ instruction stream (register scheduling, etc.) intrusion. No
slowpath cost.

and the basic API approach of markers is flawed a well - the coupling to
the kernel is too strong. The correct and long-term maintainable
coupling is via ASCII symbol names, not via any binding built into the
kernel.

With dyn-ftrace (see sched-devel.git/latest) tracing filters can be
installed trivially by users, via function _symbols_, via:

/debugfs/tracing/available_filter_functions
/debugfs/tracing/set_ftrace_filter

wildcards are recognized as well, so if you do:

echo '*lock' > /debugfs/tracing/set_ftrace_filter

all functions that have 'lock' in their name will have their tracepoints
activated transparently from that point on.

even multiple names can be passed in at once:

echo 'schedule wake_up* *acpi*' > /debugfs/tracing/set_ftrace_filter

so it's trivial to use it, very powerful and we've only begun exposing
it towards users. I see no good reason why we'd patch any marker into
the kernel - it's a maintenance cost from that point on.

so yes, my argument is: tens of thousands of lightweight tracepoints in
the kernel here and today, which are configurable via function names,
each of which can be turned on and off individually, and none of which
needs any source code level changes - is an obviously superior approach.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/