Re: [PATCH] sched/fair: Introduce priority load balance for CFS

From: Song Zhang
Date: Fri Nov 11 2022 - 21:51:30 EST


Hi, Vincent

On 2022/11/3 17:22, Vincent Guittot wrote:
On Thu, 3 Nov 2022 at 10:20, Song Zhang <zhangsong34@xxxxxxxxxx> wrote:



On 2022/11/3 16:33, Vincent Guittot wrote:
On Thu, 3 Nov 2022 at 04:01, Song Zhang <zhangsong34@xxxxxxxxxx> wrote:

Thanks for your reply!

On 2022/11/3 2:01, Vincent Guittot wrote:
On Wed, 2 Nov 2022 at 04:54, Song Zhang <zhangsong34@xxxxxxxxxx> wrote:


This really looks like a v3 of
https://lore.kernel.org/all/20220810015636.3865248-1-zhangsong34@xxxxxxxxxx/

Please keep versioning.

Add a new sysctl interface:
/proc/sys/kernel/sched_prio_load_balance_enabled

We don't want to add more sysctl knobs for the scheduler, we even
removed some. Knob usually means that you want to fix your use case
but the solution doesn't make sense for all cases.


OK, I will remove this knobs later.


0: default behavior
1: enable priority load balance for CFS

For co-location with idle and non-idle tasks, when CFS do load balance,
it is reasonable to prefer migrating non-idle tasks and migrating idle
tasks lastly. This will reduce the interference by SCHED_IDLE tasks
as much as possible.

I don't agree that it's always the best choice to migrate a non-idle task 1st.

CPU0 has 1 non idle task and CPU1 has 1 non idle task and hundreds of
idle task and there is an imbalance between the 2 CPUS: migrating the
non idle task from CPU1 to CPU0 is not the best choice


If the non idle task on CPU1 is running or cache hot, it cannot be
migrated and idle tasks can also be migrated from CPU1 to CPU0. So I
think it does not matter.

What I mean is that migrating non idle tasks first is not a universal
win and not always what we want.


But migrating online tasks first is mostly a trade-off that
non-idle(Latency Sensitive) tasks can obtain more CPU time and minimize
the interference caused by IDLE tasks. I think this makes sense in most
cases, or you can point out what else I need to think about it ?

Best regards.



Testcase:
- Spawn large number of idle(SCHED_IDLE) tasks occupy CPUs

What do you mean by a large number ?

- Let non-idle tasks compete with idle tasks for CPU time.

Using schbench to test non-idle tasks latency:
$ ./schbench -m 1 -t 10 -r 30 -R 200

How many CPUs do you have ?


OK, some details may not be mentioned.
My virtual machine has 8 CPUs running with a schbench process and 5000
idle tasks. The idle task is a while dead loop process below:

How can you care about latency when you start 10 workers on 8 vCPUs
with 5000 non idle threads ?


No no no... spawn 5000 idle(SCHED_IDLE) processes not 5000 non-idle
threads, and with 10 non-idle schbench workers on 8 vCPUs.

yes spawn 5000 idle tasks but my point remains the same


I am so sorry that I have not received your reply for a long time, and I am still waiting for it anxiously. In fact, migrating non-idle tasks 1st works well in most scenarios, so it maybe possible to add a sched_feat(LB_PRIO) to enable or disable that. Finally, I really hope you can give me some better advice.

Best regards.

Song Zhang