Re: [Patch v3 1/6] sched/fair: Determine active load balance for SMT sched groups

From: Peter Zijlstra
Date: Mon Jul 17 2023 - 07:11:37 EST


On Mon, Jul 17, 2023 at 01:06:59AM +0530, Shrikanth Hegde wrote:
>
>
> On 7/15/23 4:35 AM, Tim Chen wrote:
> > On Fri, 2023-07-14 at 18:36 +0530, Shrikanth Hegde wrote:
> >
> >>
> >>
> >> If we consider symmetric platforms which have SMT4 such as power10.
> >> we have a topology like below. multiple such MC will form DIE(PKG)
> >>
> >>
> >> [0 2 4 6][1 3 5 7][8 10 12 14][9 11 13 15]
> >> [--SMT--][--SMT--][----SMT---][---SMT----]
> >> [--sg1--][--sg1--][---sg1----][---sg1----]
> >> [--------------MC------------------------]
> >>
> >> In case of SMT4, if there is any group which has 2 or more tasks, that
> >> group will be marked as group_smt_balance. previously, if that group had 2
> >> or 3 tasks, it would have been marked as group_has_spare. Since all the groups have
> >> SMT that means behavior would be same fully busy right? That can cause some
> >> corner cases. No?
> >
> > You raised a good point. I was looking from SMT2
> > perspective so group_smt_balance implies group_fully_busy.
> > That is no longer true for SMT4.
> >
> > I am thinking of the following fix on the current patch
> > to take care of SMT4. Do you think this addresses
>
> Thanks Tim for taking a look at it again.
>
> Yes. I think this would address some of the corner cases.
> Any SMT4 group having 2,3,4 will have smt_balance as the group type, and busiest one
> is the one which has least number of idle cpu's. (same conditions as group_has_spare)
>
>
>
>
> > concerns from you and Tobias?
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 294a662c9410..3fc8d3a3bd22 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9588,6 +9588,17 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > break;
> >
> > case group_smt_balance:
> > + /* no idle cpus on both groups handled by group_fully_busy below */
> > + if (sgs->idle_cpus != 0 || busiest->idle_cpus != 0) {
> > + if (sgs->idle_cpus > busiest->idle_cpus)
> > + return false;
> > + if (sgs->idle_cpus < busiest->idle_cpus)
> > + return true;
> > + if (sgs->sum_nr_running <= busiest_sum_nr_running)
> > + return false;
> > + else
> > + return true;
> > + }
> >
> >
> > I will be on vacation next three weeks so my response will be slow.
> >
> > Tim
> >
> >>
>
> Small suggestion to above code to avoid compiler warning of switch case falling
> through and else case can be removed, since update_sd_pick_busiest by default returns true.
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e5a75c76bcaa..ae364ac6f22e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9728,9 +9728,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> return true;
> if (sgs->sum_nr_running <= busiest->sum_nr_running)
> return false;
> - else
> - return true;
> }
> + break;
> +
> case group_fully_busy:
> /*
> * Select the fully busy group with highest avg_load. In
>
>

Can someone please send a full patch for this? I've already queued Tim's
patches in tip/sched/core (tip-bot seems to have died somewhere last
week, it's being worked on).