Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

From: Valentin Schneider
Date: Mon Dec 17 2018 - 11:56:24 EST


Hi Vincent,

About time I had a look at this one...

On 14/12/2018 16:01, Vincent Guittot wrote:
> In case of active balance, we increase the balance interval to cover
> pinned tasks cases not covered by all_pinned logic.

AFAIUI the balance increase is there to have plenty of time to
stop the task before going through another load_balance().

Seeing as there is a cpus_allowed check that leads to

env.flags |= LBF_ALL_PINNED;
goto out_one_pinned;

in the active balancing part of load_balance(), the condition you're
changing should never be hit when we have pinned tasks. So you may want
to rephrase that bit.

> Neverthless, the active
> migration triggered by asym packing should be treated as the normal
> unbalanced case and reset the interval to default value otherwise active
> migration for asym_packing can be easily delayed for hundreds of ms
> because of this all_pinned detection mecanism.

Mmm so it's not exactly clear why we need this change. If we double the
interval because of a pinned task we wanted to active balance, well it's
just regular pinned task issues and we can't do much about it.

The only scenario I can think of is if you had a workload where you wanted
to do an active balance in several successive load_balance(), in which case
you would keep increasing the interval even though you do migrate a task
every time (which would harm every subsequent active_balance).

In that case every active_balance "user" (pressured CPU, misfit) is
affected, so maybe what we want is something like this:

-----8<-----
@@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq,
sd->balance_interval = sd->min_interval;
} else {
/*
- * If we've begun active balancing, start to back off. This
- * case may not be covered by the all_pinned logic if there
- * is only 1 task on the busy runqueue (because we don't call
- * detach_tasks).
+ * If we've begun active balancing, start to back off.
+ * Don't increase too much in case we have more active balances
+ * coming up.
*/
- if (sd->balance_interval < sd->max_interval)
- sd->balance_interval *= 2;
+ sd->balance_interval = 2 * sd->min_interval;
}

goto out;
----->8-----

Maybe we want a larger threshold - truth be told, it all depends on how
long the cpu stopper can take and if that delay increase is still relevant
nowadays.

>
> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9591e7a..487287e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> */
> #define MAX_PINNED_INTERVAL 512
>
> +static inline bool
> +asym_active_balance(struct lb_env *env)
> +{
> + /*
> + * ASYM_PACKING needs to force migrate tasks from busy but
> + * lower priority CPUs in order to pack all tasks in the
> + * highest priority CPUs.
> + */
> + return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> + sched_asym_prefer(env->dst_cpu, env->src_cpu);
> +}
> +
> static int need_active_balance(struct lb_env *env)
> {
> struct sched_domain *sd = env->sd;
>
> - if (env->idle != CPU_NOT_IDLE) {
> -
> - /*
> - * ASYM_PACKING needs to force migrate tasks from busy but
> - * lower priority CPUs in order to pack all tasks in the
> - * highest priority CPUs.
> - */
> - if ((sd->flags & SD_ASYM_PACKING) &&
> - sched_asym_prefer(env->dst_cpu, env->src_cpu))
> - return 1;
> - }
> + if (asym_active_balance(env))
> + return 1;
>
> /*
> * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> } else
> sd->nr_balance_failed = 0;
>
> - if (likely(!active_balance)) {
> + if (likely(!active_balance) || asym_active_balance(&env)) {

AFAICT this makes us reset the interval whenever we do an asym packing
active balance (if the task is pinned we would goto out_one_pinned).
This goes against the logic I had understood so far and that I explained
higher up.

> /* We were unbalanced, so reset the balancing interval */
> sd->balance_interval = sd->min_interval;
> } else {
>