Re: [PATCH v4 05/12] sched/fair: Keep a fully_busy SMT sched group as busiest

From: Ricardo Neri
Date: Thu May 18 2023 - 19:58:34 EST


On Sat, May 13, 2023 at 12:11:45AM +0530, Shrikanth Hegde wrote:
>
>
> On 4/7/23 2:01 AM, Ricardo Neri wrote:
> > When comparing two fully_busy scheduling groups, keep the current busiest
> > group if it represents an SMT core. Tasks in such scheduling group share
> > CPU resources and need more help than tasks in a non-SMT fully_busy group.
> >
> > Cc: Ben Segall <bsegall@xxxxxxxxxx>
> > Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Cc: Ionela Voinescu <ionela.voinescu@xxxxxxx>
> > Cc: Len Brown <len.brown@xxxxxxxxx>
> > Cc: Mel Gorman <mgorman@xxxxxxx>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Cc: Tim C. Chen <tim.c.chen@xxxxxxxxx>
> > Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
> > Cc: x86@xxxxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > Tested-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes since v3:
> > * None
> >
> > Changes since v2:
> > * Introduced this patch.
> >
> > Changes since v1:
> > * N/A
> > ---
> > kernel/sched/fair.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index b151e93ec316..ea23a5163bfa 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9566,10 +9566,22 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > * contention when accessing shared HW resources.
> > *
> > * XXX for now avg_load is not computed and always 0 so we
> > - * select the 1st one.
> > + * select the 1st one, except if @sg is composed of SMT
> > + * siblings.
> > */
> > - if (sgs->avg_load <= busiest->avg_load)
> > +
> > + if (sgs->avg_load < busiest->avg_load)
> > return false;
> > +
> > + if (sgs->avg_load == busiest->avg_load) {
> > + /*
> > + * SMT sched groups need more help than non-SMT groups.
> > + * If @sg happens to also be SMT, either choice is good.
> > + */
> > + if (sds->busiest->flags & SD_SHARE_CPUCAPACITY)
> > + return false;
> > + }
> > +
> > break;
>

Thank you very much for your review!

> IIUC,
>
> Earlier, we used to go to out_balanced if sgs->avg_load <= busiest->avg_load.
> Now we go only if it is less.

In this particular case we are comparing to fully_busy groups. Both
sgs->avg_load and busiest->avg_load are equal to zero 0.

> lets say sgs->avg_load == busiest->avg_load,
> then we will return true in MC,DIE domain. This might end up traversing
> multiple such group's and pick the last one as the busiest instead of
> first.

Yes, that is correct. But we traverse all sched groups from
update_sd_lb_stats() anyway. We are here because both sgs and busiest are
of type fully_busy and we need to break a tie. Previously we always kept
on selecting sgs as busiest.

> I guess eventually any load balance if exists will be fixed. But
> this might cause slight overhead. would it?
>
>
>
> nit: There is typo in [2/12] if the whole core is repeated.
> + * CPUs. When done between cores, do it only if the whole core if the
> + * whole core is idle.
>
> Mentioning in this reply instead, to avoid sending another mail reply for this.

Ah! I read my patches dozens of times and I still missed this. Thank you
for noting. I will post a trivial patch to fix it.

Thanks and BR,
Ricardo