Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules

From: Ingo Molnar
Date: Tue Dec 20 2011 - 06:10:28 EST



(Cc:-ing Arnaldo on this as well.)

* Mathieu Desnoyers <compudj@xxxxxxxxxxxxxxxxxx> wrote:

> > Mathieu, any update on this? I don't want the LTTNG goodies
> > to drop on the floor - we just have to integrate them
> > properly.
> >
> > If you 100% disagree with how specific things are done
> > upstream right now then don't hold back: just replace
> > existing mechanisms - that gives a starting point to discuss
> > what the best way is forward.
>
> I'm bringing a though question then: what should we do if I
> strongly think that the current ABIs should be replaced ? To
> support this, let's note that the current perf ABI:
>
> - lacks versioning information to handle change. [...]

That's not actually true on *any* level: we are changing,
evolving and extending the perf ABIs all the time.

There's two main API/ABI components:

1) the perf syscall which is part of the Linux syscall ABI.

Individual versions of the ABI have (monotonically increasing)
sizes for "struct perf_event_attr" - you can consider these
natural ABI versioning.

So the 'versioning' is not done via some inflexible and ugly,
Windows-alike 'explicit ABI version' field, but done via
structure sizes and -ENOSYS.

We've iterated and versioned it numerous times in the past 10
kernel releases, in a backwards compatible manner.

2) the perf.data file

The versioning there is capability bitmask based - modelled
after ext2/ext3/ext4 capability bitmasks. It's extensible as
well.

I think your concentration on ABIs is missing a very fundamental
property of instrumentation:

the life-time and persistence of instrumentation data is
typically very short ('days' is already an exception - typical
is minutes, at most hours), and for that reason we havent been
getting much pressure from users to maintain a perf.data ABI -
but we are doing it nevertheless.

Instrumentation is fundamentally about the 'here and now' and so
it fundamentally differs from things like backup formats and
database formats. An ABI does not hurt and we are maintaining
it, but you are overrating its importance significantly.


> [...] I think shipping the tracer tools within the Linux
> tools/ directory made sense for an initial phase that made
> tracer solutions more popular for kernel developers (and it
> did a great job a that), but if we want to move on to build
> tools that target a wider audience, we should leave the
> tools/ sandbox and create separate projects, with clearly
> defined ABIs, using ABI versioning to manage changes. At
> this point, I think that perf tool shipped within tools/ is
> more than anything a pain for non-kernel-developer users,
> and favors design of sloppy ABIs.

I think you've thoroughly misunderstood the upstream ABI
versioning status quo, which makes your argument out of this
world.

The perf ABIs are well-defined and well-maintained. See an
ad-hoc ABI and tool compatibility experiment i made here:

[F.A.Q.] perf ABI backwards and forwards compatibility
https://lkml.org/lkml/2011/11/8/77

> - makes it impossible to move to CTF (Common Trace Format)
> and benefit from the added features it allows,

"CTF" was mainly written by yourself, right?

If there's any tool worth caring about that wants to deal in CTF
then it can be converted just fine. I don't think it matters
nearly as much as you seem to imply, see my reply further below.

> - makes it needlessly hard, if not impossible, for perf to
> move to something that would have the benefits brought by
> the fast unified ring buffer code I created 2 years ago,

The current upstream code actually has a fast unified
ring-buffer, mmap()-ed to user-space, so you'd have to be a bit
more specific about that point.

> - makes it impossible to benefit from the LTTng fast trace
> clocks.

We have various trace clocks upstream as well - so you'd have to
outline it specifically why it's "impossible".

> Also, it should be noted that I am finding that the way perf
> evolved into a large monolithic binary blob that needs to be
> all enabled or all disabled makes it quite hard to extend and
> re-use. [...]

There's a (very) healthy in-flux of features - it's one of the
most active kernel and userpace projects we have.

So *others* don't find it hard to work with. If you have
specific observations i'm sure Arnaldo will appreciate them.

[ I snipped the rest of your reply - you seem to have deep
rooted misconceptions about what the current upstream
principles and practices are in this area: you are banging on
open doors! ]

Anyway, my prior request+offer stands: please split LTTNG up
into individual feature blocks done to extend or replace
existing instrumentation features and offer them as changes to
existing upstream instrumentation code. We want every
conceivable useful feature, but we *really* don't want
schizophrenic duplication in this area.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/