Re: [PATCH 13/15] perf_counter: provide generic callchain bits

From: Peter Zijlstra
Date: Tue Mar 31 2009 - 04:43:03 EST


On Tue, 2009-03-31 at 00:24 -0700, Corey Ashford wrote:
> Peter Zijlstra wrote:
> > On Tue, 2009-03-31 at 17:12 +1100, Paul Mackerras wrote:
> >> Peter Zijlstra writes:
> >>
> >>> include_tid : 1, /* include the tid */
> >>> mmap : 1, /* include mmap data */
> >>> munmap : 1, /* include munmap data */
> >>> + callchain : 1, /* add callchain data */
> >> Interesting, I would have put callchain (and include_tid, also) in
> >> hw_event.record_type rather than as individual 1-bit fields. The
> >> present arrangement where some selection of what goes into the ring
> >> buffer is in record_type and some is in individual bits seems a bit
> >> awkward. Plus, with the current arrangement I can't get both the IP
> >> and the values of the other group members, which I might reasonable
> >> want.
> >>
> >> I think either we make record_type bit-significant, or we define
> >> individual bits in hw_event for recording the IP and other group
> >> members.
> >>
> >> There are a couple of other things I want to be able to record on an
> >> event - we have registers on powerpc that give information about the
> >> event that caused the interrupt, and it would be nice to be able to
> >> record them. (These registers include instruction and data addresses
> >> associated with the event; the instruction address can be further on
> >> from where the interrupt was taken because of out-of-order instruction
> >> execution and because interrupts might be hard-disabled at the point
> >> where the interrupt becomes pending.)
> >>
> >> Those registers would need bits in record_type or in the hw_event to
> >> indicate that we want them recorded.
> >
> > Sure, I'm fine with moving them to record_type and making that a bit
> > field. I've also considered merging the group and 'simple' output to
> > enable what you say.
>
> Originally, the record_type field was used to determine what would be
> read if you read from the fd. However, now it seems to be what you get
> from the mmap buffer (only), that there is no longer a way to read the
> record stream from the fd anymore. Is this true?

Yes, read() will return the current value of the counter.
mmap() is used for all async data streams.

> If so, I like this
> idea because it simplifies the ABI: there is one way to read the current
> value of counters and another way to read sample records (via mmap),
> both of these operations use a single fd.

Correct, glad you like it ;-)

> If this is the case, what is the exact meaning of PERF_COUNTER_SIMPLE
> now? and PERF_COUNTER_GROUP?

Legacy mainly. I've been working towards unifying those. As Paul also
suggested, group can be yet another output option.

> One simplification would be that reading
> the fd of a group leader would always read up all of the counters in the
> group (along with their enabled and running times if those bits are
> set), and that reading a single counter's fd would yield only that
> counter's values and times (if enabled). In effect, these two values,
> PERF_COUNTER_GROUP and PERF_COUNTER_SIMPLE would no longer be necessary
> at all. Other bits would be used to determine what's in the mmap'd samples.

Quite so, that sounds sensible, Paul?

Would you be overly bothered by the read() output also having that
{type,size} header we use for the mmap() data?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/