Re: [RFC PATCH 1/3] Unified trace buffer

From: Ingo Molnar
Date: Thu Sep 25 2008 - 16:13:21 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

> firstly, for the sake of full disclosure, the very first versions of
> the latency tracer (which, through hundreds of revisions, morphed into
> ftrace), used raw TSC timestamps.
>
> I stuck to that simple design for a _long_ time because i shared your
> exact views about robustness and simplicity. But it was pure utter
> nightmare to get the timings right after the fact, and i got a _lot_
> of complaints about the quality of timings, and i could never _trust_
> the timings myself for certain types of analysis.
>
> So i eventually went to the scheduler clock and never looked back.
>
> So i've been there, i've done that. In fact i briefly tried to use the
> _GTOD_ clock for tracing - that was utter nightmare as well, because
> the scale and breath of the GTOD code is staggering.

heh, and i even have a link for a latency tracing patch for 2005 that is
still alive that proves it:

http://people.redhat.com/mingo/latency-tracing-patches/patches/latency-tracing.patch

(dont look at the quality of that code too much)

It has this line for timestamp generation:

+ timestamp = get_cycles();

i.e. we used the raw TSC, we used RDTSC straight away, and we used that
for _years_, literally.

So i can tell you my direct experience with it: i had far more problems
with the tracer due to inexact timings and traces that i could not
depend on, than i had problems with sched_clock() locking up or
crashing.

Far more people complained about the accuracy of timings than about
performance or about the ability (or inability) to stream gigs of
tracing data to user-space.

It was a very striking difference:

- every second person who used the tracer observed that the timings
looked odd at places.

- only every 6 months has someone asked whether he could save
gigabytes of trace data.

For years i maintained a tracer with TSC timestamps, and for years i
maintained another tracer that used sched_clock(). Exact timings are a
feature most people are willing to spend extra cycles on.

You seem to dismiss that angle by calling my arguments bullshit, but i
dont know on what basis you dismiss it. Sure, a feature and extra
complexity _always_ has a robustness cost. If your argument is that we
should move cpu_clock() to assembly to make it more dependable - i'm all
for it.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/