[PATCH v2 0/2] sched: Consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection

From: Dietmar Eggemann
Date: Fri May 12 2023 - 06:11:48 EST


This is the implementation of the idea to factor in CPU runnable_avg
into the CPU utilization getter functions (so called
'runnable boosting') as a way to consider CPU contention for:

(a) CPU frequency
(b) EAS' max util and
(c) 'migrate_util' type load-balance busiest CPU selection.

Tests:

for (a) and (b):

Testcase is Jankbench (all subtests, 10 iterations) on Pixel6 (Android
12) with mainline v5.18 kernel and forward ported task scheduler
patches.

Uclamp has been deactivated so that the Android Dynamic Performance
Framework (ADPF) 'CPU performance hints' feature (Userspace task
boosting via uclamp_min) does not interfere.

Max_frame_duration:
+-----------------+------------+
| kernel | value [ms] |
+-----------------+------------+
| base | 163.061513 |
| runnable | 161.991705 |
+-----------------+------------+

Mean_frame_duration:
+-----------------+------------+----------+
| kernel | value [ms] | diff [%] |
+-----------------+------------+----------+
| base | 18.0 | 0.0 |
| runnable | 12.7 | -29.43 |
+-----------------+------------+----------+

Jank percentage (Jank deadline 16ms):
+-----------------+------------+----------+
| kernel | value [%] | diff [%] |
+-----------------+------------+----------+
| base | 3.6 | 0.0 |
| runnable | 1.0 | -68.86 |
+-----------------+------------+----------+

Power usage [mW] (total - all CPUs):
+-----------------+------------+----------+
| kernel | value [mW] | diff [%] |
+-----------------+------------+----------+
| base | 129.5 | 0.0 |
| runnable | 134.3 | 3.71* |
+-----------------+------------+----------+

* Power usage went up from 129.3 (-0.15%) in v1 to 134.3 (3.71%) whereas
all the other benchmark numbers stayed roughly the same. This is
probably because of using 'runnable boosting' for EAS max util now as
well and tasks more often end up running on non-little CPUs because of
that.

for (c):

Testcase is 'perf bench sched messaging' on Arm64 Ampere Altra with 160
CPUs (sched domains = {MC, DIE, NUMA}) which shows some small
improvement:

perf stat --null --repeat 10 -- perf bench sched messaging -t -g 1 -l 2000

0.4869 +- 0.0173 seconds time elapsed (+- 3.55%) ->
0.4377 +- 0.0147 seconds time elapsed (+- 3.36%)

Chen Yu tested v1** with schbench, hackbench, netperf and tbench on an
Intel Sapphire Rapids with 2x56C/112T = 224 CPUs which showed no obvious
difference and some small improvements on tbench:

https://lkml.kernel.org/r/ZFSr4Adtx1ZI8hoc@chenyu5-mobl1

** The implementation for (c) hasn't changed in v2.

v1 -> v2:

(1) Refactor CPU utilization getter functions, let cpu_util_cfs() call
cpu_util_next() (now cpu_util()).

(2) Consider CPU contention in EAS (find_energy_efficient_cpu() ->
eenv_pd_max_util()) next to schedutil (sugov_get_util()) as well so
that EAS' and schedutil's views on CPU frequency selection are in
sync.

(3) Move 'util_avg = max(util_avg, runnable_avg)' from
cpu_boosted_util_cfs() to cpu_util_next() (now cpu_util()) so that
EAS can use it too.

(4) Rework patch header.

(5) Add test results (JankbenchX on Pixel6 to test changes in schedutil
and EAS) and 'perf bench sched messaging' on Arm64 Ampere Altra for
CFS load-balance (find_busiest_queue()).


Dietmar Eggemann (2):
sched/fair: Refactor CPU utilization functions
sched/fair, cpufreq: Introduce 'runnable boosting'

kernel/sched/core.c | 2 +-
kernel/sched/cpufreq_schedutil.c | 3 +-
kernel/sched/fair.c | 72 +++++++++++++++++++++++++-------
kernel/sched/sched.h | 49 +---------------------
4 files changed, 63 insertions(+), 63 deletions(-)

--
2.25.1