Re: [announce] Performance Counters for Linux, v6

From: Paul Mackerras
Date: Mon Mar 09 2009 - 19:01:46 EST

Next message: Andrew Morton: "Re: [PATCH] tweak spin logic for hpilo"
Previous message: David Altobelli: "[PATCH] tweak spin logic for hpilo"
In reply to: Robert Richter: "Re: [announce] Performance Counters for Linux, v6"
Next in thread: Robert Richter: "Re: [announce] Performance Counters for Linux, v6"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Robert Richter writes:

> Some points to mention here. This patch set actually introduces two
> interfaces, a new user/kernel interface and an in-kernel api to access
> performance counters. These are separate things and sometimes mixed
> too much. There is a strong need for an in-kernel api. This is the

We have been concentrating more on the user/kernel API since that is
the one that cannot be changed in an incompatible way once this stuff
goes upstream. The in-kernel API can be changed at any time and is
still evolving.

> third implementation I am involved (oprofile, perfmon are the others)
> and the things are always the same way. All these subsystems should be
> merged to one in-kernel implemenation and share the same code. The
> different user/kernel i/fs could then coexist and meet the users
> different needs.

It would certainly be good to get oprofile to use the same low-level
machinery as perf_counters. I'm not sure what the fate of perfmon
will be, but it seems unlikely it will go upstream in anything like
its present form.

> > +static const int intel_perfmon_event_map[] =
> > +{
> > + [PERF_COUNT_CPU_CYCLES] = 0x003c,
> > + [PERF_COUNT_INSTRUCTIONS] = 0x00c0,
> > + [PERF_COUNT_CACHE_REFERENCES] = 0x4f2e,
> > + [PERF_COUNT_CACHE_MISSES] = 0x412e,
> > + [PERF_COUNT_BRANCH_INSTRUCTIONS] = 0x00c4,
> > + [PERF_COUNT_BRANCH_MISSES] = 0x00c5,
> > + [PERF_COUNT_BUS_CYCLES] = 0x013c,
> > +};
>
> I would like to define _all_ the behaviour of the architecture and the
> models in functions instead of parameters and lists. It is hard to
> explain why, because it is more esthetics, but I believe, only nice
> things work best. Let me try.
>
> 1) The list above seems to be random, there are lots of events and it
> is hard to define, which event is really important. Surely these
> events are important, but it is hard to draw a line here.

I see that list as a convenience for doing a few simple performance
measurements. For any serious in-depth analysis userspace will know
what processor it's running on and use raw event codes.

> 2) The list assumes/implies the events are available on all
> architectures and cpus. This is probably not the case, and also, the
> existence of an event must not be _important_ for a certain
> architecture. But it has to be there even if it is of no interest.
>
> 3) Hard to extend. If an event is added here this could have impact to
> all other architectures. Data structures are changing.
>
> 4) In the kernel the behaviour of a subsystem is offen implemented by
> functions (e.g. struct device_driver). There are lots of ops structs
> in the kernel and there are reasons for it.
>
> 5) ops structs are more dynamic. The data could be generated
> dynamically and does not have to be static in some tables and
> variables.
>
> So, instead of making the list a public data structure, better pass
> the type to an arch specific function, e.g.:
>
> int arch_xxx_setup_event(int event_type);

That's exactly what we have, except that it's called
hw_perf_counter_init and the event_type you have there is in the
struct perf_counter that gets passed in.

> If the type is not supported, an error could be returned. There is no
> more impact. Even the binaries of the builds would be identically if
> hw_event_types would be extended for a single different architecture.
>
> The same applies also for counters and so on, better implement
> functions.

All of that is already done; hw_perf_counter_init gets to interpret
the counter->hw_event.type and counter->hw_event.raw fields and decide
whether the event is supported, and return an error if not. On x86 it
looks like there is a further ops structure (struct pmc_x86_ops) which
allows each x86-compatible cpu type to supply its own functions for
doing the interpretation of counter->hw_event and other things.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH] tweak spin logic for hpilo"
Previous message: David Altobelli: "[PATCH] tweak spin logic for hpilo"
In reply to: Robert Richter: "Re: [announce] Performance Counters for Linux, v6"
Next in thread: Robert Richter: "Re: [announce] Performance Counters for Linux, v6"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]