Re: [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers

From: Mel Gorman
Date: Fri Oct 22 2021 - 07:05:40 EST


On Fri, Oct 22, 2021 at 12:26:08PM +0200, Mike Galbraith wrote:
> On Thu, 2021-10-21 at 15:56 +0100, Mel Gorman wrote:
> >
> > From additional tests on various servers, the impact is machine dependant
> > but generally this patch improves the situation.
> >
> > hackbench-process-pipes
> >                           5.15.0-rc3             5.15.0-rc3
> >                              vanilla  sched-wakeeflips-v1r1
> > Amean     1        0.3667 (   0.00%)      0.3890 (  -6.09%)
> > Amean     4        0.5343 (   0.00%)      0.5217 (   2.37%)
> > Amean     7        0.5300 (   0.00%)      0.5387 (  -1.64%)
> > Amean     12       0.5737 (   0.00%)      0.5443 (   5.11%)
> > Amean     21       0.6727 (   0.00%)      0.6487 (   3.57%)
> > Amean     30       0.8583 (   0.00%)      0.8033 (   6.41%)
> > Amean     48       1.3977 (   0.00%)      1.2400 *  11.28%*
> > Amean     79       1.9790 (   0.00%)      1.8200 *   8.03%*
> > Amean     110      2.8020 (   0.00%)      2.5820 *   7.85%*
> > Amean     141      3.6683 (   0.00%)      3.2203 *  12.21%*
> > Amean     172      4.6687 (   0.00%)      3.8200 *  18.18%*
> > Amean     203      5.2183 (   0.00%)      4.3357 *  16.91%*
> > Amean     234      6.1077 (   0.00%)      4.8047 *  21.33%*
> > Amean     265      7.1313 (   0.00%)      5.1243 *  28.14%*
> > Amean     296      7.7557 (   0.00%)      5.5940 *  27.87%*
> >
> > While different machines showed different results, in general
> > there were much less CPU migrations of tasks
>
> Patchlet helped hackbench? That's.. unexpected (at least by me).
>

I didn't analyse in depth and other machines do not show as dramatic
a difference but it's likely due to timings of tasks getting wakeup
preempted. On a 2-socket cascadelake machine the difference was -7.4%
to 7.66% depending on group count. The second biggest loss was -0.71%
and more gains than losses. In each case, CPU migrations and system CPU
usage are reduced.

The big difference here is likely because the machine is Zen 3 and has
multiple LLCs per cache so it suffers more if there are imbalances between
LLCs that wouldn't be visible on most Intel machines with 1 LLC per socket.

> > tbench4
> >                            5.15.0-rc3             5.15.0-rc3
> >                               vanilla  sched-wakeeflips-v1r1
> > Hmean     1         824.05 (   0.00%)      802.56 *  -2.61%*
> > Hmean     2        1578.49 (   0.00%)     1645.11 *   4.22%*
> > Hmean     4        2959.08 (   0.00%)     2984.75 *   0.87%*
> > Hmean     8        5080.09 (   0.00%)     5173.35 *   1.84%*
> > Hmean     16       8276.02 (   0.00%)     9327.17 *  12.70%*
> > Hmean     32      15501.61 (   0.00%)    15925.55 *   2.73%*
> > Hmean     64      27313.67 (   0.00%)    24107.81 * -11.74%*
> > Hmean     128     32928.19 (   0.00%)    36261.75 *  10.12%*
> > Hmean     256     35434.73 (   0.00%)    38670.61 *   9.13%*
> > Hmean     512     50098.34 (   0.00%)    53243.75 *   6.28%*
> > Hmean     1024    69503.69 (   0.00%)    67425.26 *  -2.99%*
> >
> > Bit of a mixed bag but wins more than it loses.
>
> Hm. If patchlet repeatably impacts buddy pairs one way or the other,
> it should probably be tossed out the nearest window.
>

I don't see how buddy pairing would be impacted although there is likely
differences in the degree tasks get preempted due to pulling tasks.

--
Mel Gorman
SUSE Labs