Re: [PATCH 2/9] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()

From: Ingo Molnar
Date: Fri Mar 08 2024 - 04:56:02 EST



* Shrikanth Hegde <sshegde@xxxxxxxxxxxxx> wrote:

>
>
> On 3/4/24 3:18 PM, Ingo Molnar wrote:
> > From: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
> >
> > Shrikanth Hegde reported that show_schedstat() output broke when
> > the ordering of the definitions in 'enum cpu_idle_type' is changed,
> > because show_schedstat() assumed that 'CPU_IDLE' is 0.
> >
> Hi Ingo.
> Feel free to drop me from the changelog.

Yeah - I made you the author of the commit, and indeed it should not refer
to you in the third person. :-) Fixed.

>
> > @@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
> >
> > seq_printf(seq, "domain%d %*pb", dcount++,
> > cpumask_pr_args(sched_domain_span(sd)));
> > - for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
> > - itype++) {
> > + for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
>
>
> It would still not be same order as current documentation of schedstat.
> no? The documentation would need changes too. Change SCHEDSTAT_VERSION to
> 16?

Correct. I've bumped SCHEDSTAT_VERSION up to 16 now, but since it hasn't
been changed for the last 10+ years I'm wondering whether that's the right
thing to do or we should add a quirk to maintain the v15 ordering?

I think we should also output the actual symbolic cpu_idle_type names into
schedstat, so that tooling (and observant kernel developers) can see the
actual ordering of the [CPU_MAX_IDLE_TYPES] columns.

A new line like this (mockup):

cpu0 0 0 4400 1485 1624 1229 301472313236 120382198 7714
+ cpu_idle_type CPU_IDLE 0 CPU_NOT_IDLE 1 CPU_NEWLY_IDLE 2 CPU_MAX_IDLE_TYPES 3
domain0 00000000,00000000,00000055 1661 1661 0 0 0 0 0 1661 2495 2495 0 0 0 0 0 2495 67 66 1 2 0 0 0 66 0 0 0 0 0 0 0 0 0 133 38 0

.. and after the change this would become:

cpu_idle_type CPU_NOT_IDLE 0 CPU_IDLE 1 CPU_NEWLY_IDLE 2 CPU_MAX_IDLE_TYPES 3

or so?

This gives tooling (that cares) a way to enumerate the idle types, without
having to rely on their numeric values. Adding a new line to schedstat
shouldn't break existing tooling - and if it does, we've increased
SCHEDSTAT_VERSION to 16 anyway. ;-)

Thanks,

Ingo