Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()"

From: Chris Mason
Date: Mon Oct 26 2020 - 12:50:33 EST


On 26 Oct 2020, at 12:20, Vincent Guittot wrote:

Le lundi 26 oct. 2020 à 12:04:45 (-0400), Rik van Riel a écrit :
On Mon, 26 Oct 2020 16:42:14 +0100
Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
On Mon, 26 Oct 2020 at 16:04, Rik van Riel <riel@xxxxxxxxxxx> wrote:

Could utilization estimates be off, either lagging or
simply having a wrong estimate for a task, resulting
in no task getting pulled sometimes, while doing a
migrate_task imbalance always moves over something?

task and cpu utilization are not always up to fully synced and may lag
a bit which explains that sometimes LB can fail to migrate for a small
diff

OK, running with this little snippet below, I see latencies
improve back to near where they used to be:

Latency percentiles (usec) runtime 150 (s)
50.0th: 13
75.0th: 31
90.0th: 69
95.0th: 90
*99.0th: 761
99.5th: 2268
99.9th: 9104
min=1, max=16158

I suspect the right/cleaner approach might be to use
migrate_task more in !CPU_NOT_IDLE cases?

Running a task to an idle CPU immediately, instead of refusing
to have the load balancer move it, improves latencies for fairly
obvious reasons.

I am not entirely clear on why the load balancer should need to
be any more conservative about moving tasks than the wakeup
path is in eg. select_idle_sibling.


what you are suggesting is something like:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4978964e75e5..3b6fbf33abc2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9156,7 +9156,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
* emptying busiest.
*/
if (local->group_type == group_has_spare) {
- if (busiest->group_type > group_fully_busy) {
+ if ((busiest->group_type > group_fully_busy) &&
+ !(env->sd->flags & SD_SHARE_PKG_RESOURCES)) {
/*
* If busiest is overloaded, try to fill spare
* capacity. This might end up creating spare capacity

which also fixes the problem for me and alignes LB with wakeup path regarding the migration
in the LLC

Vincent’s patch on top of 5.10-rc1 looks pretty great:

Latency percentiles (usec) runtime 90 (s) (3320 total samples)
50.0th: 161 (1687 samples)
75.0th: 200 (817 samples)
90.0th: 228 (488 samples)
95.0th: 254 (164 samples)
*99.0th: 314 (131 samples)
99.5th: 330 (17 samples)
99.9th: 356 (13 samples)
min=29, max=358

Next we test in prod, which probably won’t have answers until tomorrow. Thanks again Vincent!

-chris