Re: [RFC PATCH] perf: Add PERF_RECORD_SWITCH to indicate context switches

From: Peter Zijlstra
Date: Thu Jun 11 2015 - 10:16:04 EST


On Tue, Jun 09, 2015 at 05:21:10PM +0300, Adrian Hunter wrote:
> Tracepoints are no good at all for non-privileged users
> because they need either CAP_SYS_ADMIN or
> /proc/sys/kernel/perf_event_paranoid <= -1.
>
> On the other hand, kernel software events need either
> CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1.

So while I think it makes sense to allow some tracepoint outside of that
priv level, IOW have a per tracepoint priv level filter thingy, I don't
think sched_switch() is one of those because it explicitly exposes
timing information on other tasks.

> This new PERF_RECORD_SWITCH event does not have those problems
> and it also has a couple of other small advantages. It is
> easier to use because it is an auxiliary event (like mmap,
> comm and task events) which can be enabled by setting a single
> bit. It is smaller than sched:sched_switch and easier to parse.

Right, so the one wee problem I have is that this only provides sched_in
data, I imagine people might be interested in sched_out as well.

Typically the switch even provides prev and next and thereby is
complete, but since we're limiting it to the one specific task, we'll
not have the sched_out data.

> @@ -812,6 +813,18 @@ enum perf_event_type {
> */
> PERF_RECORD_ITRACE_START = 12,
>
> + /*
> + *
> + *
> + * struct {
> + * struct perf_event_header header;
> + * u32 pid, tid;
> + * u64 time;

all 3 are already part of sample_id.

> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_SWITCH = 13,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/