Re: [PATCH] sched/fair: Return the busiest group only if imbalance is meaningful

From: Vincent Guittot
Date: Wed Apr 06 2022 - 13:28:30 EST


On Wed, 6 Apr 2022 at 17:07, 彭志刚 <zgpeng.linux@xxxxxxxxx> wrote:
>
> YES. The following are specific scenarios where negative values occur:
>
>
>
> The scheduler domain contains four groups, namely groupA, groupB, groupC, and groupD;
>
> The types and avg_load conditions of the four groups are as follows
>
>
>
> GroupA TYPE= group_fully_busy avg_load=10 [local group]
>
>
>
> GroupB TYPE= group_has_spare avg_load=1
>
> GroupC TYPE= group_has_spare avg_load=1
>
> GroupD TYPE= group_overloaded avg_load=20 [busiest group]
>
>
>
> The CPU that calls load_balance is located in groupA, and update_sd_lb_stats will select the busiest group in GroupB, groupC, and
>
> groupD, that is, gorupD. Under this condition, other judgments in the find_busiest_group function will be bypassed and the
>
> calculate_imbalance function will be called. The judgment in the middle of the calculate_imbalance function cannot stop this
>
> situation, and it will go to the imbalance calculate logic at the end of the function.At this time, the domain's avg_load=8, but
>
> the local_groupthe's avg_load=10; The negative value is therefore generated.

I think that this case should be covered by an additional test in
calculate_imbalance because we should not try to pull load in groupA
if it's already above the average load. Something like below should
cover this

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9469,6 +9469,16 @@ static inline void calculate_imbalance(struct
lb_env *env, struct sd_lb_stats *s

sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) /
sds->total_capacity;
+
+ /*
+ * Don't pull any tasks if this group is already above the
+ * domain average load.
+ */
+ if (local->avg_load >= sds->avg_load) {
+ env->imbalance = 0;
+ return;
+ }
+
/*
* If the local group is more loaded than the selected
* busiest group don't try to pull any tasks.

>
>
> Vincent Guittot <vincent.guittot@xxxxxxxxxx> 于2022年4月6日周三 20:59写道:
>>
>> On Wed, 6 Apr 2022 at 13:23, zgpeng <zgpeng.linux@xxxxxxxxx> wrote:
>> >
>> > When calculate_imbalance function calculate the imbalance, it may
>> > actually get a negative number. In this case, it is meaningless to
>>
>> We should not return a negative imbalance but I suppose this can
>> happen when we are using the avg_load metrics to calculate imbalance.
>> Have you faced a use case where imbalance is negative ?
>>
>> > return the so-called busiest group and continue to search for the
>> > busiest cpu later. Therefore, only when the imbalance is greater
>> > than 0 should return the busiest group, otherwise return NULL.
>> >
>> > Signed-off-by: zgpeng <zgpeng@xxxxxxxxxxx>
>> > Reviewed-by: Samuel Liao <samuelliao@xxxxxxxxxxx>
>> > ---
>> > kernel/sched/fair.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 601f8bd..9f75303 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -9639,7 +9639,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>> > force_balance:
>> > /* Looks like there is an imbalance. Compute it */
>> > calculate_imbalance(env, &sds);
>> > - return env->imbalance ? sds.busiest : NULL;
>> > + return env->imbalance > 0 ? sds.busiest : NULL;
>> >
>> > out_balanced:
>> > env->imbalance = 0;
>> > --
>> > 2.9.5
>> >