Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance

From: Valentin Schneider
Date: Mon Nov 25 2019 - 07:48:34 EST


On 18/10/2019 14:26, Vincent Guittot wrote:
> tip/sched/core w/ this patchset improvement
> schedpipe 53125 +/-0.18% 53443 +/-0.52% (+0.60%)
>
> hackbench -l (2560/#grp) -g #grp
> 1 groups 1.579 +/-29.16% 1.410 +/-13.46% (+10.70%)
> 4 groups 1.269 +/-9.69% 1.205 +/-3.27% (+5.00%)
> 8 groups 1.117 +/-1.51% 1.123 +/-1.27% (+4.57%)
> 16 groups 1.176 +/-1.76% 1.164 +/-2.42% (+1.07%)
>
> Unixbench shell8
> 1 test 1963.48 +/-0.36% 1902.88 +/-0.73% (-3.09%)
> 224 tests 2427.60 +/-0.20% 2469.80 +/-0.42% (1.74%)
>
> - large arm64 2 nodes / 224 cores system
>
> tip/sched/core w/ this patchset improvement
> schedpipe 124084 +/-1.36% 124445 +/-0.67% (+0.29%)
>
> hackbench -l (256000/#grp) -g #grp
> 1 groups 15.305 +/-1.50% 14.001 +/-1.99% (+8.52%)
> 4 groups 5.959 +/-0.70% 5.542 +/-3.76% (+6.99%)
> 16 groups 3.120 +/-1.72% 3.253 +/-0.61% (-4.92%)
> 32 groups 2.911 +/-0.88% 2.837 +/-1.16% (+2.54%)
> 64 groups 2.805 +/-1.90% 2.716 +/-1.18% (+3.17%)
> 128 groups 3.166 +/-7.71% 3.891 +/-6.77% (+5.82%)
> 256 groups 3.655 +/-10.09% 3.185 +/-6.65% (+12.87%)
>
> dbench
> 1 groups 328.176 +/-0.29% 330.217 +/-0.32% (+0.62%)
> 4 groups 930.739 +/-0.50% 957.173 +/-0.66% (+2.84%)
> 16 groups 1928.292 +/-0.36% 1978.234 +/-0.88% (+0.92%)
> 32 groups 2369.348 +/-1.72% 2454.020 +/-0.90% (+3.57%)
> 64 groups 2583.880 +/-3.39% 2618.860 +/-0.84% (+1.35%)
> 128 groups 2256.406 +/-10.67% 2392.498 +/-2.13% (+6.03%)
> 256 groups 1257.546 +/-3.81% 1674.684 +/-4.97% (+33.17%)
>
> Unixbench shell8
> 1 test 6944.16 +/-0.02 6605.82 +/-0.11 (-4.87%)
> 224 tests 13499.02 +/-0.14 13637.94 +/-0.47% (+1.03%)
> lkp reported a -10% regression on shell8 (1 test) for v3 that
> seems that is partially recovered on my platform with v4.
>

I've been busy trying to get some perf numbers on arm64 server~ish systems,
I finally managed to get some specjbb numbers on TX2 (the 2 nodes, 224
CPUs version which I suspect is the same as you used in the above). I only
have a limited number of iterations (5, although each runs for about 2h)
because I wanted to get some (usable) results by today, I'll spin some more
during the week.


This is based on the "critical-jOPs" metric which AFAIU higher is better:

Baseline, SMTOFF:
mean 12156.400000
std 660.640068
min 11016.000000
25% 12158.000000
50% 12464.000000
75% 12521.000000
max 12623.000000

Patches (+ find_idlest_group() fixup), SMTOFF:
mean 12487.250000
std 184.404221
min 12326.000000
25% 12349.250000
50% 12449.500000
75% 12587.500000
max 12724.000000


It looks slightly better overall (mean, stddev), but I'm annoyed by that
low iteration count. I also had some issues with my SMTON run and I only
got numbers for 2 iterations, so I'll respin that before complaining.

FWIW the branch I've been using is:

http://www.linux-arm.org/git?p=linux-vs.git;a=shortlog;h=refs/heads/mainline/load-balance/vincent_rework/tip