Re: [RFC][PATCH 02/12] perf: Fix cgroup event scheduling

From: Stephane Eranian
Date: Mon Jan 11 2016 - 14:43:46 EST


Peter,

On Mon, Jan 11, 2016 at 8:25 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> There appears to be a problemin __perf_event_task_sched_in() wrt
> cgroup event scheduling.
>
> The normal event scheduling order is:
>
> CPU pinned
> Task pinned
> CPU flexible
> Task flexible
>
> And since perf_cgroup_sched*() only schedules the cpu context, we must
> call this _before_ adding the task events.
>
I understand this but I am trying to understand why cgroup system-wide event
would be treated differently from regular system-wide events w.r.t. task events
here. If I do a cgroup flexible and a task pinned, what happens?

> Note: double check what happens on the ctx switch optimization where
> the task ctx isn't scheduled.
>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/events/core.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2804,6 +2804,16 @@ void __perf_event_task_sched_in(struct t
> struct perf_event_context *ctx;
> int ctxn;
>
> + /*
> + * If cgroup events exist on this CPU, then we need to check if we have
> + * to switch in PMU state; cgroup event are system-wide mode only.
> + *
> + * Since cgroup events are CPU events, we must schedule these in before
> + * we schedule in the task events.
> + */
> + if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
> + perf_cgroup_sched_in(prev, task);
> +
> for_each_task_context_nr(ctxn) {
> ctx = task->perf_event_ctxp[ctxn];
> if (likely(!ctx))
> @@ -2811,13 +2821,6 @@ void __perf_event_task_sched_in(struct t
>
> perf_event_context_sched_in(ctx, task);
> }
> - /*
> - * if cgroup events exist on this CPU, then we need
> - * to check if we have to switch in PMU state.
> - * cgroup event are system-wide mode only
> - */
> - if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
> - perf_cgroup_sched_in(prev, task);
>
> if (atomic_read(&nr_switch_events))
> perf_event_switch(task, prev, true);
>
>