[RFC PATCH] sched/fair: Skip idle CPU search on busy system

From: Shrikanth Hegde
Date: Wed Jul 26 2023 - 05:38:28 EST


When the system is fully busy, there will not be any idle CPU's.
In that case, load_balance will be called mainly with CPU_NOT_IDLE
type. In should_we_balance its currently checking for an idle CPU if
one exist. When system is 100% busy, there will not be an idle CPU and
these idle_cpu checks can be skipped. This would avoid fetching those rq
structures.

This is a minor optimization for a specific case of 100% utilization.

.....
Coming to the current implementation. It is a very basic approach to the
issue. It may not be the best/perfect way to this. It works only in
case of CONFIG_NO_HZ_COMMON. nohz.nr_cpus is a global info available which
tracks idle CPU's. AFAIU there isn't any other. If there is such info, we
can use that instead. nohz.nr_cpus is atomic, which might be costly too.

Alternative way would be to add a new attribute to sched_domain and update
it in cpu idle entry/exit path per CPU. Advantage is, check can be per
env->sd instead of global. Slightly complicated, but maybe better. there
could other advantage at wake up to limit the scan etc.

Your feedback would really help. Does this optimization makes sense?

Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
---
kernel/sched/fair.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..903d59b5290c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10713,6 +10713,12 @@ static int should_we_balance(struct lb_env *env)
return 1;
}

+#ifdef CONFIG_NO_HZ_COMMON
+ /* If the system is fully busy, its better to skip the idle checks */
+ if (env->idle == CPU_NOT_IDLE && atomic_read(&nohz.nr_cpus) == 0)
+ return group_balance_cpu(sg) == env->dst_cpu;
+#endif
+
/* Try to find first idle CPU */
for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
if (!idle_cpu(cpu))
--
2.31.1