Re: [Patch v2 2/6] sched/topology: Record number of cores in sched group

From: Peter Zijlstra
Date: Mon Jun 12 2023 - 07:39:22 EST


On Thu, Jun 08, 2023 at 03:32:28PM -0700, Tim Chen wrote:
> From: Tim C Chen <tim.c.chen@xxxxxxxxxxxxxxx>
>
> When balancing sibling domains that have different number of cores,
> tasks in respective sibling domain should be proportional to the number
> of cores in each domain. In preparation of implementing such a policy,
> record the number of tasks in a scheduling group.
>
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> ---
> kernel/sched/sched.h | 1 +
> kernel/sched/topology.c | 10 +++++++++-
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3d0eb36350d2..5f7f36e45b87 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1860,6 +1860,7 @@ struct sched_group {
> atomic_t ref;
>
> unsigned int group_weight;
> + unsigned int cores;
> struct sched_group_capacity *sgc;
> int asym_prefer_cpu; /* CPU of highest priority in group */
> int flags;
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6d5628fcebcf..6b099dbdfb39 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1275,14 +1275,22 @@ build_sched_groups(struct sched_domain *sd, int cpu)
> static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
> {
> struct sched_group *sg = sd->groups;
> + struct cpumask *mask = sched_domains_tmpmask2;
>
> WARN_ON(!sg);
>
> do {
> - int cpu, max_cpu = -1;
> + int cpu, cores = 0, max_cpu = -1;
>
> sg->group_weight = cpumask_weight(sched_group_span(sg));
>
> + cpumask_copy(mask, sched_group_span(sg));
> + for_each_cpu(cpu, mask) {
> + cores++;
> + cpumask_andnot(mask, mask, cpu_smt_mask(cpu));
> + }
> + sg->cores = cores;
> +
> if (!(sd->flags & SD_ASYM_PACKING))
> goto next;

Just a note; not sure we want or can do anything about this, but
consider someone doing partitions like:

[0,1] [2,3] [3,6]
[------] [------]

That is, 3 SMT cores, and 2 partitions splitting an SMT core in two.

Then the domain trees will see either 2 or 3 but not the fully core.

I'm perfectly fine with saying: don't do that then.