Re: [patch] Performance Counters for Linux, v3

From: Peter Zijlstra
Date: Fri Dec 12 2008 - 03:26:10 EST


On Thu, 2008-12-11 at 13:02 -0500, Vince Weaver wrote:

> I have at least 60 machines that I do regular performance counter work on.
> They involve Pentium Pro, Pentium II, 32-bit Athlon, 64-bit Athlon,
> Pentium 4, Pentium D, Core, Core2, Atom, MIPS R12k, Niagara T1,
> and PPC/Playstation 3.

Good.

> Perfmon3 works for all of those 60 machines. This new proposal works on a
> 2 out of the 60.

s/works/is implemented/

> Who is going to add support for all of those machines? I've spent a lot
> of developer time getting prefmon going for all of those configurations.
> But why should I help out with this new inferior proposal? It could all
> be another waste of time.

So much for constructive critisism.. have you tried taking the design to
its limits, if so, where do you see problems?

I read the above as: I invested a lot of time in something of dubious
statue (out of tree patch), and now expect it to be merged because I
have invested in it.

> Also, my primary method of using counters is total aggregate count for a
> single user-space process.

Process, as in single thread, or multi-threaded? I'll assume
single-thread.

> Can this new infrastructure to this?

Yes, afaict it can.

You can group counters in v3, a read out of such a group will be an
atomic read out and provide vectored output that contains all the data
in one stream.

> I find the documentation/tools support to be very incomplete.

Gosh, what does one expect from something that is hardly a week old..

> One comment on the patch.
>
> > + /*
> > + * Common hardware events, generalized by the kernel:
> > + */
> > + PERF_COUNT_CYCLES = 0,
> > + PERF_COUNT_INSTRUCTIONS = 1,
> > + PERF_COUNT_CACHE_REFERENCES = 2,
> > + PERF_COUNT_CACHE_MISSES = 3,
> > + PERF_COUNT_BRANCH_INSTRUCTIONS = 4,
> > + PERF_COUNT_BRANCH_MISSES = 5,
>
> Many machines do not support these counts. For example, Niagara T1 does
> not have a CYCLES count. And good luck if you think you can easily come
> up with something meaningful for the various kind of CACHE_MISSES on the
> Pentium 4. Also, the Pentium D has various flavors of retired instruction
> count with slightly different semantics. This kind of abstraction should
> be done in userspace.

I'll argue to disagree, sure such events might not be supported by any
particular hardware implementation - but the fact that PAPI gives a list
of 'common' events means that they are, well, common. So unifying them
between those archs that do implement them seems like a sane choice, no?

For those archs that do not support it, it will just fail to open. No
harm done.

The proposal allows for you to specify raw hardware events, so you can
just totally ignore this part of the abstraction.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/