[PATCH v3 0/2] sched: Consider CPU contention in frequency, EAS max util & load-balance busiest CPU selection

From: Dietmar Eggemann
Date: Mon May 15 2023 - 08:22:59 EST


This is the implementation of the idea to factor in CPU runnable_avg
into the CPU utilization getter functions (so called 'runnable
boosting') as a way to consider CPU contention for:

(a) CPU frequency
(b) EAS' max util and
(c) 'migrate_util' type load-balance busiest CPU selection.

Tests:

for (a) and (b):

Testcase is Jankbench (all subtests, 10 iterations) on Pixel6 (Android
12) with mainline v5.18 kernel and forward ported task scheduler
patches.

Uclamp has been deactivated so that the Android Dynamic Performance
Framework (ADPF) 'CPU performance hints' feature (Userspace task
boosting via uclamp_min) does not interfere.

Max_frame_duration:
+-----------------+------------+
| kernel | value [ms] |
+-----------------+------------+
| base | 163.061513 |
| runnable | 161.991705 |
+-----------------+------------+

Mean_frame_duration:
+-----------------+------------+----------+
| kernel | value [ms] | diff [%] |
+-----------------+------------+----------+
| base | 18.0 | 0.0 |
| runnable | 12.7 | -29.43 |
+-----------------+------------+----------+

Jank percentage (Jank deadline 16ms):
+-----------------+------------+----------+
| kernel | value [%] | diff [%] |
+-----------------+------------+----------+
| base | 3.6 | 0.0 |
| runnable | 1.0 | -68.86 |
+-----------------+------------+----------+

Power usage [mW] (total - all CPUs):
+-----------------+------------+----------+
| kernel | value [mW] | diff [%] |
+-----------------+------------+----------+
| base | 129.5 | 0.0 |
| runnable | 134.3 | 3.71* |
+-----------------+------------+----------+

* Power usage went up from 129.3 (-0.15%) in v1 to 134.3 (3.71%) whereas
all the other benchmark numbers stayed roughly the same. This is
probably because of using 'runnable boosting' for EAS max util now as
well and tasks more often end up running on non-little CPUs because of
that.

for (c):

Testcase is 'perf bench sched messaging' on Arm64 Ampere Altra with 160
CPUs (sched domains = {MC, DIE, NUMA}) which shows some small
improvement:

perf stat --null --repeat 10 -- perf bench sched messaging -t -g 1 -l 2000

0.4869 +- 0.0173 seconds time elapsed (+- 3.55%) ->
0.4377 +- 0.0147 seconds time elapsed (+- 3.36%)

Chen Yu tested v1** with schbench, hackbench, netperf and tbench on an
Intel Sapphire Rapids with 2x56C/112T = 224 CPUs which showed no obvious
difference and some small improvements on tbench:

https://lkml.kernel.org/r/ZFSr4Adtx1ZI8hoc@chenyu5-mobl1

** The implementation for (c) hasn't changed in v2.

v1 -> v2:

(1) Refactor CPU utilization getter functions, let cpu_util_cfs() call
cpu_util_next() (now cpu_util()).

(2) Consider CPU contention in EAS (find_energy_efficient_cpu() ->
eenv_pd_max_util()) next to schedutil (sugov_get_util()) as well so
that EAS' and schedutil's views on CPU frequency selection are in
sync.

(3) Move 'util_avg = max(util_avg, runnable_avg)' from
cpu_boosted_util_cfs() to cpu_util_next() (now cpu_util()) so that
EAS can use it too.

(4) Rework patch header.

(5) Add test results (JankbenchX on Pixel6 to test changes in schedutil
and EAS) and 'perf bench sched messaging' on Arm64 Ampere Altra for
CFS load-balance (find_busiest_queue()).

v2 -> v3:

(1) Move function header from cpu_util_cfs() to cpu_util() and add a
paragraph about 'runnable boosting'.

(2) Create cpu_util_cfs_boost() and call it for sites which want to use
'runnable boosting'.

(3) Use regular 'if (boost)' in cpu_util().

Dietmar Eggemann (2):
sched/fair: Refactor CPU utilization functions
sched/fair, cpufreq: Introduce 'runnable boosting'

kernel/sched/cpufreq_schedutil.c | 3 +-
kernel/sched/fair.c | 87 ++++++++++++++++++++++++++------
kernel/sched/sched.h | 48 +-----------------
3 files changed, 76 insertions(+), 62 deletions(-)

--
2.25.1