Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

From: Mark Rutland
Date: Thu Apr 16 2015 - 12:32:10 EST


Hi,

If you're going to fundamentally change the behaviour of
perf_invalid_context, please Cc authors of other system PMU drivers.
Intel aren't the only ones with such PMUs.

For instance, this affects the ARM CCI and CCN PMU drivers.

On Wed, Apr 15, 2015 at 08:56:11AM +0100, Kan Liang wrote:
> From: Kan Liang <kan.liang@xxxxxxxxx>
>
> The pmu marked as perf_invalid_context don't have any state to switch on
> context switch. Everything is global. So it is OK to be part of sw/hw
> groups.
> In sched_out/sched_in, del/add must be called, so the
> perf_invalid_context event can be disabled/enabled accordingly during
> context switch. The event count only be read when the event is already
> sched_in.
>
> However group read doesn't work with mix events.
>
> For example,
> perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> It always gets EINVAL.

>From my PoV that makes sense. One is CPU-affine, the other is not, and
the two cannot be scheduled in the same PMU transaction by the nature of
the hardware. Fundamentally, you cannot provide group semantics due to
this.

Even if you ignore the fundamental semantics of groups, there are other
problems with allowing shared contexts:

* The *_txn functions only get called on the group leader's PMU. If your
system PMU has these functions, they are not called.

* Event rotation is per ctx, but now you could have some events in a CPU
PMU's context, and some in the uncore PMU's context. So those can race
with each other.

* Throttling is also per-context. So those can race with each other too.

> This patch set intends to fix this issue.
> perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]

You can already count the events concurrently without grouping them, and
the above implies that this patch just ends up misleading the user
w.r.t. group semantics.

If you want to be able to sample the events with a single read, then you
can attach the FDs.

I don't see that this solves a real problem. I see that it introduces a
new set of problems in addition to complicating existing code.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/