Re: [PATCH] perf cgroups: Don't rotate events for cgroups unnecessarily

From: Ganapatrao Kulkarni
Date: Fri Aug 23 2019 - 06:44:01 EST


Hi,

We are seeing regression with our uncore perf driver(Marvell's
ThunderX2, ARM64 server platform) on 5.3-Rc1.
After bisecting, it turned out to be this patch causing the issue.

Test case:
Load module and run perf for more than 4 events( we have 4 counters,
event multiplexing takes place for more than 4 events), then unload
module.
With this sequence of testing, the system hangs(soft lockup) after 2
or 3 iterations. Same test runs for hours on 5.2.

while [ 1 ]
do
rmmod thunderx2_pmu
modprobe thunderx2_pmu
perf stat -a -e \
uncore_dmc_0/cnt_cycles/,\
uncore_dmc_0/data_transfers/,\
uncore_dmc_0/read_txns/,\
uncore_dmc_0/config=0xE/,\
uncore_dmc_0/write_txns/ sleep 1
sleep 2
done

Is this patch tested adequately?

On Fri, Jun 28, 2019 at 3:18 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> group_index On Mon, Jun 24, 2019 at 12:55 AM Peter Zijlstra
> <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Jun 21, 2019 at 11:01:29AM -0700, Ian Rogers wrote:
> > > On Fri, Jun 21, 2019 at 1:24 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Sat, Jun 01, 2019 at 01:27:22AM -0700, Ian Rogers wrote:
> > > > > @@ -3325,6 +3331,15 @@ static int flexible_sched_in(struct perf_event *event, void *data)
> > > > > sid->can_add_hw = 0;
> > > > > }
> > > > >
> > > > > + /*
> > > > > + * If the group wasn't scheduled then set that multiplexing is necessary
> > > > > + * for the context. Note, this won't be set if the event wasn't
> > > > > + * scheduled due to event_filter_match failing due to the earlier
> > > > > + * return.
> > > > > + */
> > > > > + if (event->state == PERF_EVENT_STATE_INACTIVE)
> > > > > + sid->ctx->rotate_necessary = 1;
> > > > > +
> > > > > return 0;
> > > > > }
> > > >
> > > > That looked odd; which had me look harder at this function which
> > > > resulted in the below. Should we not terminate the context interation
> > > > the moment one flexible thingy fails to schedule?
> > >
> > > If we knew all the events were hardware events then this would be
> > > true, as there may be software events that always schedule then the
> > > continued iteration is necessary.
> >
> > But this is the 'old' code, where this is guaranteed by the context.
> > That is, if this is a hardware context; there wil only be software
> > events due to them being in a group with hardware events.
> >
> > If this is a software group, then we'll never fail to schedule and we'll
> > not get in this branch to begin with.
> >
> > Or am I now confused for having been staring at two different code-bases
> > at the same time?
>
> I believe you're right and I'd overlooked this. I think there is a
> more efficient version of this code possible that I can follow up
> with. There are 3 perf_event_contexts, potentially a sw and hw context
> within the task_struct and one per-CPU in perf_cpu_context. With this
> change I'm focussed on improving rotation of cgroup events that appear
> system wide within the per-CPU context. Without cgroup events the
> system wide events don't need to be iterated over during scheduling
> in. The branch that can set rotate_necessary will only be necessary
> for the task hardware events. For system wide events, considered with
> cgroup mode scheduling in, the branch is necessary as rotation may be
> necessary. It'd be possible to have variants of flexible_sched_in that
> behave differently in the task software event context, and the system
> wide and task hardware contexts.
>
> I have a series of patches related to Kan Liang's cgroup
> perf_event_groups improvements. I'll mail these out and see if I can
> avoid the branch in the task software event context case.
>
> Thanks,
> Ian

Thanks,
Ganapat