Re: [PATCH 1/1] sched: Consider CPU contention in frequency & load-balance busiest CPU selection

From: Chen Yu
Date: Fri May 05 2023 - 03:10:58 EST


On 2023-04-06 at 17:50:30 +0200, Dietmar Eggemann wrote:
> Use new cpu_boosted_util_cfs() instead of cpu_util_cfs().
>
> The former returns max(util_avg, runnable_avg) capped by max CPU
> capacity. CPU contention is thereby considered through runnable_avg.
>
> The change in load-balance only affects migration type `migrate_util`.
>
> Suggested-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
>
Tested on Intel Sapphire Rapids which has 2x56C/112T = 224 CPUs.
The test tries to check if this is any impact on find_busiest_queue()
so it was tested with cpufreq governor performance.
The baseline is the 6.3 sched/core branch on top of
Commit 67fff302fc445a ("sched/fair: Introduce SIS_CURRENT to wake up"),
and compared to the code with current patch applied.

In summary no obvious difference and some small improvements on tbench
were observed so far:

schbench(latency)
========
case load baseline(std%) compare%( std%)
normal 1-mthreads 1.00 ( 0.00) +1.75 ( 1.26)
normal 2-mthreads 1.00 ( 5.84) -5.41 ( 2.09)
normal 4-mthreads 1.00 ( 2.59) -3.67 ( 1.25)
normal 8-mthreads 1.00 ( 2.46) +3.48 ( 0.00)

hackbench(throughput)
=========
case load baseline(std%) compare%( std%)
process-pipe 1-groups 1.00 ( 0.26) +0.73 ( 2.18)
process-pipe 2-groups 1.00 ( 3.91) +1.96 ( 6.17)
process-pipe 4-groups 1.00 ( 3.59) -2.56 ( 5.18)
process-sockets 1-groups 1.00 ( 0.97) +1.83 ( 0.80)
process-sockets 2-groups 1.00 ( 6.09) +3.83 ( 8.19)
process-sockets 4-groups 1.00 ( 0.87) -5.94 ( 1.86)
threads-pipe 1-groups 1.00 ( 0.44) +0.23 ( 0.17)
threads-pipe 2-groups 1.00 ( 1.18) +1.41 ( 1.16)
threads-pipe 4-groups 1.00 ( 2.40) +1.34 ( 1.86)
threads-sockets 1-groups 1.00 ( 1.97) -2.27 ( 1.44)
threads-sockets 2-groups 1.00 ( 3.85) -2.44 ( 2.42)
threads-sockets 4-groups 1.00 ( 1.18) -2.93 ( 1.09)

netperf(throughput)
=======
case load baseline(std%) compare%( std%)
TCP_RR 56-threads 1.00 ( 4.35) +2.50 ( 4.73)
TCP_RR 112-threads 1.00 ( 4.05) +2.12 ( 4.05)
TCP_RR 168-threads 1.00 ( 5.10) +0.10 ( 3.70)
TCP_RR 224-threads 1.00 ( 3.37) +0.52 ( 2.79)
TCP_RR 280-threads 1.00 ( 10.04) -0.36 ( 10.14)
TCP_RR 336-threads 1.00 ( 17.45) +0.07 ( 19.04)
TCP_RR 392-threads 1.00 ( 27.89) -0.00 ( 30.48)
TCP_RR 448-threads 1.00 ( 38.99) +0.29 ( 33.93)
UDP_RR 56-threads 1.00 ( 7.98) -6.91 ( 13.97)
UDP_RR 112-threads 1.00 ( 18.06) +5.83 ( 27.46)
UDP_RR 168-threads 1.00 ( 17.45) -3.00 ( 29.40)
UDP_RR 224-threads 1.00 ( 21.15) -3.99 ( 28.64)
UDP_RR 280-threads 1.00 ( 19.74) -3.20 ( 29.57)
UDP_RR 336-threads 1.00 ( 22.26) -4.24 ( 32.35)
UDP_RR 392-threads 1.00 ( 35.88) -5.53 ( 35.76)
UDP_RR 448-threads 1.00 ( 40.38) -2.65 ( 48.57)

tbench(throughput)
======
case load baseline(std%) compare%( std%)
loopback 56-threads 1.00 ( 0.74) +2.54 ( 0.84)
loopback 112-threads 1.00 ( 0.37) -2.26 ( 1.01)
loopback 168-threads 1.00 ( 0.49) +1.44 ( 3.05)
loopback 224-threads 1.00 ( 0.20) +6.05 ( 0.54)
loopback 280-threads 1.00 ( 0.44) +5.35 ( 0.05)
loopback 336-threads 1.00 ( 0.02) +5.03 ( 0.06)
loopback 392-threads 1.00 ( 0.07) +5.03 ( 0.04)
loopback 448-threads 1.00 ( 0.06) +4.86 ( 0.22)

thanks,
Chenyu