Re: [PATCH 1/9] sched/balancing: Switch the 'DEFINE_SPINLOCK(balancing)' spinlock into an 'atomic_t sched_balance_running' flag

From: Ingo Molnar
Date: Fri Mar 08 2024 - 04:49:13 EST



* Valentin Schneider <vschneid@xxxxxxxxxx> wrote:

> On 04/03/24 10:48, Ingo Molnar wrote:
> > The 'balancing' spinlock added in:
> >
> > 08c183f31bdb ("[PATCH] sched: add option to serialize load balancing")
> >
> > ... is taken when the SD_SERIALIZE flag is set in a domain, but in reality it
> > is a glorified global atomic flag serializing the load-balancing of
> > those domains.
> >
> > It doesn't have any explicit locking semantics per se: we just
> > spin_trylock() it.
> >
> > Turn it into a ... global atomic flag. This makes it more
> > clear what is going on here, and reduces overhead and code
> > size a bit:
> >
> > # kernel/sched/fair.o: [x86-64 defconfig]
> >
> > text data bss dec hex filename
> > 60730 2721 104 63555 f843 fair.o.before
> > 60718 2721 104 63543 f837 fair.o.after
> >
> > Also document the flag a bit.
> >
> > No change in functionality intended.
> >
> > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
>
> Few comment nits, otherwise:
>
> Reviewed-by: Valentin Schneider <vschneid@xxxxxxxxxx>

Thanks!

> > -static DEFINE_SPINLOCK(balancing);
> > +/*
> > + * This flag serializes load-balancing passes over large domains
> > + * (such as SD_NUMA) - only once load-balancing instance may run
> ^^^^
> s/once/one/
>
> Also, currently the flag is only set for domains above the NODE topology
> level, sd_init() will reject an architecture that forces SD_SERIALIZE in a
> topology level's ->sd_flags(), so what about:
>
> s/(such as SD_NUMA)/(above the NODE topology level)

Agreed & done.

> > + * at a time, to reduce overhead on very large systems with lots
> > + * of CPUs and large NUMA distances.
> > + *
> > + * - Note that load-balancing passes triggered while another one
> > + * is executing are skipped and not re-tried.
> > + *
> > + * - Also note that this does not serialize sched_balance_domains()
> ^^^^^^^^^^^^^^^^^^^^^
> Did you mean rebalance_domains()?

Correct, a later rename that unifies the nomenclature of all the
rebalancing functions along the sched_balance_*() prefix that I have not
posted yet crept back into this comment block, but obviously this patch
should refer to the current namespace. Fixed.

Thanks,

Ingo