Re: [RFC][PATCH] perf: Rewrite core context handling

From: Song Liu
Date: Sat Oct 13 2018 - 04:32:47 EST




> On Oct 12, 2018, at 2:50 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
>
> Can we please not top-post?
>
> On Thu, Oct 11, 2018 at 10:37:14PM +0000, Song Liu wrote:
>> Thanks Peter! These are really really helpful.
>>
>> I am trying to think through the case of a group of two events on two
>> separate hardware PMUs. In current implementation, this will not trigger
>> move_group,
>
> Right, currently this is disallowed (or should be, I'll need to double
> check the code).
>
>> so they will not be able to rotate together? And actually,
>> they may not be able to run at all? Maybe this case is never supported?
>
> Indeed, we do not allow mixing events of different PMUs, with the
> explicit exception of software events. Since software events must always
> schedule, they're allowed to be fitted into any group.
>
>> On the other hand, would something like this work:
>>
>> perf_cpu_context <-[1:2]-> perf_event_context <-[1:n]-> perf_event
>> | |
>> `----[1:n]----> pmu <----- [1:n]----------'
>>

After reading the code more, I think my idea in the figure above is
similar to this patch. The "pmu" in the figure is actually
perf_cpu_pmu_context. And I was thinking about something similar to
current pmu (not in the figure above).

I spent about two hours right here try to explain my idea. I ended
up delete everything I typed and agree almost all your design
decisions.

I just realized that, if we don't allow group of events on two
different hardware PMUs, the design of this patch works very well.
Rotation of multiple PMUs at the same time is not necessary.

The only suggestion I have right now is on which struct owns which
data:

1. perf_cpu_context owns two perf_event_context: ctx and *task_ctx.
This is the same as right now.
2. perf_event_context owns multiple perf_event_pmu_context:
One perf_event_pmu_context for software groups;
One perf_event_pmu_context for each hardware PMU.
3. perf_event_pmu_context owns RB tree of events. Since we don't
need rotation across multiple hardware PMUs, the rotation is
within same perf_event_pmu_context.
4. perf_cpu_context owns multiple perf_cpu_pmu_context:
One perf_cpu_pmu_context for each hardware PMU.
perf_cpu_pmu_context is tot needed for software only groups(?).
5. perf_cpu_pmu_context has two pointers of perf_event_pmu_context.


The following diff (on top of this patch) shows the idea above.
I don't think it changes any mechanism. But it feels simpler to me.

Thanks,
Song

diff --git i/include/linux/perf_event.h w/include/linux/perf_event.h
index 462315239f8f..b15e679d4802 100644
--- i/include/linux/perf_event.h
+++ w/include/linux/perf_event.h
@@ -762,10 +762,7 @@ struct perf_event_context {
struct mutex mutex;

struct list_head pmu_ctx_list;
-
- struct perf_event_groups pinned_groups;
- struct perf_event_groups flexible_groups;
- struct list_head event_list;
+ struct perf_event_pmu_context sw_ctx;

int nr_events;
int nr_active;
@@ -806,7 +803,7 @@ struct perf_event_context {
#define PERF_NR_CONTEXTS 4

struct perf_cpu_pmu_context {
- struct perf_event_pmu_context epc;
+ struct perf_event_pmu_context *epc; /* I am still debating this one */
struct perf_event_pmu_context *task_epc;

struct list_head sched_cb_entry;
@@ -827,6 +824,7 @@ struct perf_cpu_pmu_context {
struct perf_cpu_context {
struct perf_event_context ctx;
struct perf_event_context *task_ctx;
+ struct list_head list_of_perf_cpu_pmu_context; /* may be removed? */

#ifdef CONFIG_CGROUP_PERF
struct perf_cgroup *cgrp;
@@ -834,6 +832,10 @@ struct perf_cpu_context {
#endif

int online;
+
+ struct perf_event_groups pinned_groups;
+ struct perf_event_groups flexible_groups;
+ struct list_head event_list;
};

struct perf_output_handle {