Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units

From: Alexander Shishkin
Date: Wed Dec 18 2013 - 08:24:23 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
>> Instruction tracing PMUs are capable of recording a log of instruction
>> execution flow on a cpu core, which can be useful for profiling and crash
>> analysis. This patch adds itrace infrastructure for perf events and the
>> rest of the kernel to use.
>>
>> Since such PMUs can produce copious amounts of trace data, it may be
>> impractical to process it inside the kernel in real time, but instead export
>> raw trace streams to userspace for subsequent analysis. Thus, itrace PMUs
>> may export their trace buffers, which can be mmap()ed to userspace from a
>> perf event fd with a PERF_EVENT_ITRACE_OFFSET offset. To that end, perf
>> is extended to work with multiple ring buffers per event, reusing the
>> ring_buffer code in an attempt to reduce complexity.
>
> Please read the thread here: https://lkml.org/lkml/2008/12/4/64
>
> On my thoughts of this creative mmap() usage.

That's unfortunate, it made sense to me. But let's then have a look at
the alternative approaches. Bearing in mind that it is crucial for us to
export trace buffers to userspace as opposed to processing the trace
data in the kernel, the fact that we still need the normal perf data
stream and your dislike for mmap trickery, we need two separate file
descriptors: one for the perf data and one for the trace data.

One way of doing this would be to call sys_perf_event_open() once for
each. The first call would return a file descriptor, which provides good
old perf data buffer; the second call would use this file descriptor for
a group leader and will return another descriptor (thus creating another
perf_event), which, when mmap()ed, will provide a trace buffer.

Or, we could introduce a new PERF_FLAG_XXX to mean that we want a
descriptor with a trace buffer. And then, of course, one could always
add an ioctl(), but that'd probably be a bit over the top.

Do any of these sound reasonable? Any other possibilities that I'm
missing here?

Thanks,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/