Re: [PATCH] sched/topology: Fix overlapping sched_group build

From: Valentin Schneider
Date: Wed Mar 25 2020 - 13:54:08 EST



On Tue, Mar 24 2020, Valentin Schneider wrote:
> kernel/sched/topology.c | 23 ++++++++++++++++++++---
> 1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 8344757bba6e..7033b27e5162 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -866,7 +866,7 @@ build_balance_mask(struct sched_domain *sd, struct sched_group *sg, struct cpuma
> continue;
>
> /* If we would not end up here, we can't continue from here */
> - if (!cpumask_equal(sg_span, sched_domain_span(sibling->child)))
> + if (!cpumask_subset(sg_span, sched_domain_span(sibling->child)))

So this is one source of issues; what I've done here is a bit stupid
since we include CPUs that *cannot* end up there. What I should've done
is something like:

cpumask_and(tmp, sched_domain_span(sibling->child), sched_domain_span(sd));
if (!cpumask_equal(sg_span, tmp))
...

But even with that I just unfold even more horrors: this breaks the
overlapping sched_group_capacity (see 1676330ecfa8 ("sched/topology: Fix
overlapping sched_group_capacity")).

For instance, here I would have

CPU0-domain2-group4: span=4-5
CPU4-domain2-group4: span=4-7 mask=4-5

Both groups are at the same topology level and have the same first CPU,
so they point to the same sched_group_capacity structure - but they
don't have the same span. They would without my "fix", but then the
group spans are back to being wrong. I'm starting to think this is
doomed, at least in the current state of things :/