Re: [PATCH v4 2/3] perf: Userspace event

From: Pawel Moll
Date: Wed Jan 21 2015 - 11:02:18 EST


On Mon, 2015-01-05 at 13:12 +0000, Peter Zijlstra wrote:
> On Thu, Nov 06, 2014 at 04:51:57PM +0000, Pawel Moll wrote:
> > This patch adds a PR_TASK_PERF_UEVENT prctl call which can be used by
> > any process to inject custom data into perf data stream as a new
> > PERF_RECORD_UEVENT record, if such process is being observed or if it
> > is running on a CPU being observed by the perf framework.
> >
> > The prctl call takes the following arguments:
> >
> > prctl(PR_TASK_PERF_UEVENT, type, size, data, flags);
> >
> > - type: a number meaning to describe content of the following data.
> > Kernel does not pay attention to it and merely passes it further in
> > the perf data, therefore its use must be agreed between the events
> > producer (the process being observed) and the consumer (performance
> > analysis tool). The perf userspace tool will contain a repository of
> > "well known" types and reference implementation of their decoders.
> > - size: Length in bytes of the data.
> > - data: Pointer to the data.
> > - flags: Reserved for future use. Always pass zero.
> >
> > Perf context that are supposed to receive events generated with the
> > prctl above must be opened with perf_event_attr.uevent set to 1. The
> > PERF_RECORD_UEVENT records consist of a standard perf event header,
> > 32-bit type value, 32-bit data size and the data itself, followed by
> > padding to align the overall record size to 8 bytes and optional,
> > standard sample_id field.
> >
> > Example use cases:
> >
> > - "perf_printf" like mechanism to add logging messages to perf data;
> > in the simplest case it can be just
> >
> > prctl(PR_TASK_PERF_UEVENT, 0, 8, "Message", 0);
> >
> > - synchronisation of performance data generated in user space with the
> > perf stream coming from the kernel. For example, the marker can be
> > inserted by a JIT engine after it generated portion of the code, but
> > before the code is executed for the first time, allowing the
> > post-processor to pick the correct debugging information.
>
> The think I remember being raised was a unified means of these msgs
> across perf/ftrace/lttng. I am not seeing that mentioned.

Right. I was considering the "well known types repository" an attempt in
this direction. Having said that - ftrace also takes a random blob as
the trace marker, so the unification has to happen in userspace anyway.
I'll have a look what LTTng has to say in this respect.

> Also, I would like a stronger rationale for the @type argument, if it
> has no actual meaning why is it separate from the binary msg data?

Valid point. Without type 0 defined as a string, it doesn't bring
anything into the equation. I just have a gut feeling that sooner than
later we will want to split the messages somehow. Maybe we should make
it a "reserved for future use, use 0 now" field?

* struct {
* struct perf_event_header header;
* u32 __reserved; /* always 0 */
* u32 size;
* char data[size];
* char __padding[-size & 7];
* struct sample_id sample_id;
* };

or, probably even better, make it a version value at a known offset
(currently always 1, with just size and random sized data following).

* struct {
* struct perf_event_header header;
* u32 version; /* use 1 */
* u32 size;
* char data[size];
* char __padding[-size & 7];
* struct sample_id sample_id;
* };

So that we can mutate the user events format without too much of the
pain - the parsers will simply complain about unknown format if such
occurs and with the size of the record in the header, it is possible to
skip it.

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/