Re: [PATCH 4/9] lib/group_cpus: optimize outer loop in grp_spread_init_one()

From: Ming Lei
Date: Sat Jan 20 2024 - 01:18:02 EST


On Sat, Jan 20, 2024 at 11:51:58AM +0800, Ming Lei wrote:
> On Fri, Jan 19, 2024 at 06:50:48PM -0800, Yury Norov wrote:
> > Similarly to the inner loop, in the outer loop we can use for_each_cpu()
> > macro, and skip CPUs that have been moved.
> >
> > With this patch, the function becomes O(1), despite that it's a
> > double-loop.
> >
> > While here, add a comment why we can't merge outer logic into the inner
> > loop.
> >
> > Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx>
> > ---
> > lib/group_cpus.c | 14 ++++++++------
> > 1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index 0a8ac7cb1a5d..952aac9eaa81 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -17,16 +17,17 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > const struct cpumask *siblmsk;
> > int cpu, sibl;
> >
> > - for ( ; cpus_per_grp > 0; ) {
> > - cpu = cpumask_first(nmsk);
> > -
> > - /* Should not happen, but I'm too lazy to think about it */
> > - if (cpu >= nr_cpu_ids)
> > + for_each_cpu(cpu, nmsk) {
> > + if (cpus_per_grp-- == 0)
> > return;
> >
> > + /*
> > + * If a caller wants to spread IRQa on offline CPUs, we need to
> > + * take care of it explicitly because those offline CPUS are not
> > + * included in siblings cpumask.
> > + */
> > __cpumask_clear_cpu(cpu, nmsk);
> > __cpumask_set_cpu(cpu, irqmsk);
> > - cpus_per_grp--;
> >
> > /* If the cpu has siblings, use them first */
> > siblmsk = topology_sibling_cpumask(cpu);
> > @@ -38,6 +39,7 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >
> > __cpumask_clear_cpu(sibl, nmsk);
> > __cpumask_set_cpu(sibl, irqmsk);
> > + cpu = sibl + 1;
>
> It has been tricky enough to update condition variable of for_each_cpu()
> (such kind of pattern can't build in Rust at all), and the above line could
> be more tricky actually.

Not only the above line is tricky, but also it is wrong, because 'cpu'
local variable should always point to the 1st bit in 'nmsk'. However, if
you set it to 'sibl + 1', some bits in 'nmsk' are skipped in the loop,
aren't they?


Thanks,
Ming