Re: [PATCH v2] perf/core: set cgroup in cpu contexts for new cgroup events

From: David Carrillo-Cisneros
Date: Wed Aug 03 2016 - 14:51:11 EST


Hi Vegard,

I don't think this patch fixes your bug, but it touches some code that
may be related.

David

On Wed, Aug 3, 2016 at 8:46 AM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> Hi,
>
> On 2 August 2016 at 09:48, David Carrillo-Cisneros <davidcc@xxxxxxxxxx> wrote:
>> There is an optimization in perf_cgroup_sched_{in,out} that skips the
>> switch of cgroup events if the old and new cgroups in a task switch are
>> the same. This optimization interacts with the current code in two ways
>> that cause a cpu context's cgroup (cpuctx->cgrp) to be NULL even if a
>> cgroup event matches the current task. These are:
>>
>> 1. On creation of the first cgroup event in a CPU: In current code,
>> cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
>> aforesaid optimization, perf_cgroup_sched_in will run until the next
>> cgroup switches in that cpu. This may happen late or never happen,
>> depending on system's number of cgroups, cpu load, etc.
>>
>> 2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
>> cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
>> because cpuctx->cgrp == NULL until a cgroup switch occurs and
>> perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
>>
>> This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
>> mirroring what list_del_event does when removing a cgroup event from CPU
>> context, as introduced in:
>> commit 68cacd29167b ("perf_events: Fix stale ->cgrp pointer in
>> update_cgrp_time_from_cpuctx()")
>>
>> With this patch, cpuctx->cgrp is always set/clear when installing/removing
>> the first/last cgroup event in/from the cpu context. With cpuctx->cgrp
>> correctly set, event_filter_match works as intended when events are
>> sched in/out.
>>
>> The problem is easy to observe in a machine with only one cgroup:
>>
>> $ perf stat -e cycles -I 1000 -C 0 -G /
>> # time counts unit events
>> 1.000161699 <not counted> cycles /
>> 2.000355591 <not counted> cycles /
>> 3.000565154 <not counted> cycles /
>> 4.000951350 <not counted> cycles /
>>
>> After the fix, the output is as expected:
>>
>> $ perf stat -e cycles -I 1000 -a -G /
>> # time counts unit events
>> 1.004699159 627342882 cycles /
>> 2.007397156 615272690 cycles /
>> 3.010019057 616726074 cycles /
>>
>> Rebased at peterz/queue/perf/core.
>>
>> Changes in v2:
>> - Fix build error when no CONFIG_CGROUP_PERF.
>> - Unify add and del cases into list_update_cgroup_event.
>> - Remove cgroup exclusive variables from builds
>> without CONFIG_CGROUP_PERF.
>>
>> Signed-off-by: David Carrillo-Cisneros <davidcc@xxxxxxxxxx>
>
> Is this supposed to fix the bug I reported here?
>
> https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1197757.html
>
> If so, you may want to add:
>
> 1) Fixes: f2fb6bef92514 ("perf/core: Optimize side-band event delivery")
> 2) Reported-by: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
> 3) a link to the thread
>
> and I can give it a test to see if it fixes the problem I was running into.
>
> If not, please ignore :-)
>
> Thanks,
>
>
> Vegard