Re: [PATCH 2/7] sched/balancing: Change 'enum cpu_idle_type' to have more natural definitions

From: Shrikanth Hegde
Date: Sat Mar 02 2024 - 04:05:36 EST




On 3/1/24 4:39 PM, Ingo Molnar wrote:
> The cpu_idle_type enum has the confusingly inverted property
> that 'not idle' is 1, and 'idle' is '0'.
>
> This resulted in a number of unnecessary complications in the code.
>
> Reverse the order, remove the CPU_NOT_IDLE type, and convert
> all code to a natural boolean form.
>
> It's much more readable:
>
> - enum cpu_idle_type idle = this_rq->idle_balance ?
> - CPU_IDLE : CPU_NOT_IDLE;
> -
> + enum cpu_idle_type idle = this_rq->idle_balance;
>
> --------------------------------
>
> - if (env->idle == CPU_NOT_IDLE || !busiest->sum_nr_running)
> + if (!env->idle || !busiest->sum_nr_running)
>
> --------------------------------
>
> And gets rid of the double negation in these usages:
>
> - if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1)
> + if (env->idle && env->src_rq->nr_running <= 1)
>
> Furthermore, this makes code much more obvious where there's
> differentiation between CPU_IDLE and CPU_NEWLY_IDLE.
>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
> ---
> include/linux/sched/idle.h | 3 +--
> kernel/sched/fair.c | 27 ++++++++++++---------------
> 2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
> index 478084f9105e..4a6423700ffc 100644
> --- a/include/linux/sched/idle.h
> +++ b/include/linux/sched/idle.h
> @@ -5,8 +5,7 @@
> #include <linux/sched.h>
>
> enum cpu_idle_type {
> - CPU_IDLE,
> - CPU_NOT_IDLE,
> + CPU_IDLE = 1,
> CPU_NEWLY_IDLE,
> CPU_MAX_IDLE_TYPES
> };

[...]

> struct rq *this_rq = this_rq();
> - enum cpu_idle_type idle = this_rq->idle_balance ?
> - CPU_IDLE : CPU_NOT_IDLE;
> -
> + enum cpu_idle_type idle = this_rq->idle_balance;
> /*
> * If this CPU has a pending nohz_balance_kick, then do the
> * balancing on behalf of the other idle CPUs whose ticks are

Hi Ingo.
This is more readable code indeed.

But schedtstat area also needs a fix. I applied the patch, I see it displays
only for CPU_IDLE and CPU_NEWLY_IDLE since stats.c displays starting from CPU_IDLE.


Did below test.
echo 1 > /proc/sys/kernel/sched_schedstats
sleep 300
stress-ng --cpu=$(nproc) -t 300
cat /proc/schedstat


6.8.rc5 -- There are 36 fields.
cpu0 0 0 4400 1485 1624 1229 301472313236 120382198 7714
domain0 00000000,00000000,00000055 1661 1661 0 0 0 0 0 1661 2495 2495 0 0 0 0 0 2495 67 66 1 2 0 0 0 66 0 0 0 0 0 0 0 0 0 133 38 0
domain1 ff000000,00ff0000,ffffffff 382 369 13 13 4 0 2 207 198 195 3 36 0 0 0 195 67 64 3 3 0 0 0 64 4 0 4 0 0 0 0 0 0 124 9 0
domain2 ff00ffff,00ffffff,ffffffff 586 585 1 6 0 0 0 365 118 116 2 96 0 0 0 116 67 67 0 0 0 0 0 67 0 0 0 0 0 0 0 0 0 59 0 0
domain3 ffffffff,ffffffff,ffffffff 481 479 2 58 0 0 0 387 97 97 0 0 0 0 0 96 67 67 0 0 0 0 0 67 0 0 0 0 0 0 0 0 0 79 0 0

+Patch - There are 28 fields.
cpu0 0 0 2520 1244 1283 974 2273868054 212337506 6911
domain0 00000000,00000000,00000055 1975 1975 0 0 0 0 0 1975 45 45 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 65 3 0
domain1 ff000000,00ff0000,ffffffff 441 438 3 3 1 0 0 242 45 43 2 2 0 0 0 43 1 0 1 0 0 0 0 0 0 48 0 0
domain2 ff00ffff,00ffffff,ffffffff 655 654 1 2 0 0 0 468 45 45 0 0 0 0 0 45 0 0 0 0 0 0 0 0 0 152 0 0
domain3 ffffffff,ffffffff,ffffffff 521 521 0 0 0 0 0 472 44 44 0 0 0 0 0 44 0 0 0 0 0 0 0 0 0 44 0 0


I think its all getting accounted. Just that its not being printed.
With the below change, able to see all the three types, albeit not in right order as
per schedstat documentation.

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..f36b54bdd9fa 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -150,7 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)

seq_printf(seq, "domain%d %*pb", dcount++,
cpumask_pr_args(sched_domain_span(sd)));
- for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
+ for (itype = 0; itype < CPU_MAX_IDLE_TYPES;
itype++) {
seq_printf(seq, " %u %u %u %u %u %u %u %u",
sd->lb_count[itype],