Re: [PATCH 1/8] sched/fair: Clean up active balance nr_balance_failed trickery

From: Valentin Schneider
Date: Wed Feb 03 2021 - 13:44:17 EST


>> @@ -9805,9 +9810,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>> active_load_balance_cpu_stop, busiest,
>> &busiest->active_balance_work);
>> }
>> -
>> - /* We've kicked active balancing, force task migration. */
>> - sd->nr_balance_failed = sd->cache_nice_tries+1;
>
> This has an impact on future calls to need_active_balance() too, no? We enter
> this path because need_active_balance() returned true; one of the conditions it
> checks for is
>
> return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
>
> So since we used to reset nr_balanced_failed to cache_nice_tries+1, the above
> condition would be false in the next call or two IIUC. But since we remove
> that, we could end up here again soon.
>
> Was this intentional?
>

Partially, I'd say :-)

If you look at active_load_balance_cpu_stop(), it does

sd->nr_balance_failed = 0;

when it successfully pulls a task. So we get a reset of the failed counter
on pull, which I've preserved. As for interactions with later
need_active_balance(), the commit that introduced the current counter write
(which is over 15 years old!):

3950745131e2 ("[PATCH] sched: fix SMT scheduling problems")

only states the task_hot() issue; thus I'm doubtful whether said
interaction was intentional.