Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance

From: Vincent Guittot
Date: Thu Oct 24 2019 - 10:59:22 EST


On Thu, 24 Oct 2019 at 15:47, Phil Auld <pauld@xxxxxxxxxx> wrote:
>
> On Thu, Oct 24, 2019 at 08:38:44AM -0400 Phil Auld wrote:
> > Hi Vincent,
> >
> > On Mon, Oct 21, 2019 at 10:44:20AM +0200 Vincent Guittot wrote:
> > > On Mon, 21 Oct 2019 at 09:50, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> > > >

[...]

> > > > A full run on Mel Gorman's magic scalability test-suite would be super
> > > > useful ...
> > > >
> > > > Anyway, please be on the lookout for such performance regression reports.
> > >
> > > Yes I monitor the regressions on the mailing list
> >
> >
> > Our kernel perf tests show good results across the board for v4.
> >
> > The issue we hit on the 8-node system is fixed. Thanks!
> >
> > As we didn't see the fairness issue I don't expect the results to be
> > that different on v4a (with the followup patch) but those tests are
> > queued up now and we'll see what they look like.
> >
>
> Initial results with fix patch (v4a) show that the outlier issues on
> the 8-node system have returned. Median time for 152 and 156 threads
> (160 cpu system) goes up significantly and worst case goes from 340
> and 250 to 550 sec. for both. And doubles from 150 to 300 for 144

For v3, you had a x4 slow down IIRC.


> threads. These look more like the results from v3.

OK. For v3, we were not sure that your UC triggers the slow path but
it seems that we have the confirmation now.
The problem happens only for this 8 node 160 cores system, isn't it ?

The fix favors the local group so your UC seems to prefer spreading
tasks at wake up
If you have any traces that you can share, this could help to
understand what's going on. I will try to reproduce the problem on my
system

>
> We're re-running the test to get more samples.

Thanks
Vincent

>
>
> Other tests and systems were still fine.
>
>
> Cheers,
> Phil
>
>
> > Numbers for my specific testcase (the cgroup imbalance) are basically
> > the same as I posted for v3 (plus the better 8-node numbers). I.e. this
> > series solves that issue.
> >
> >
> > Cheers,
> > Phil
> >
> >
> > >
> > > >
> > > > Also, we seem to have grown a fair amount of these TODO entries:
> > > >
> > > > kernel/sched/fair.c: * XXX borrowed from update_sg_lb_stats
> > > > kernel/sched/fair.c: * XXX: only do this for the part of runnable > running ?
> > > > kernel/sched/fair.c: * XXX illustrate
> > > > kernel/sched/fair.c: } else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
> > > > kernel/sched/fair.c: * can also include other factors [XXX].
> > > > kernel/sched/fair.c: * [XXX expand on:
> > > > kernel/sched/fair.c: * [XXX more?]
> > > > kernel/sched/fair.c: * [XXX write more on how we solve this.. _after_ merging pjt's patches that
> > > > kernel/sched/fair.c: * XXX for now avg_load is not computed and always 0 so we
> > > > kernel/sched/fair.c: /* XXX broken for overlapping NUMA groups */
> > > >
> > >
> > > I will have a look :-)
> > >
> > > > :-)
> > > >
> > > > Thanks,
> > > >
> > > > Ingo
> >
> > --
> >
>
> --
>