Re: [PATCH v2] sched/fair: Introduce priority load balance to reduce interference from IDLE tasks

From: Vincent Guittot
Date: Tue Aug 23 2022 - 12:53:25 EST


Hi Zhang,

On Mon, 22 Aug 2022 at 08:49, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
>
> Hi, Vincent,
>
> On 2022/8/20 0:04, Vincent Guittot wrote:
> > On Fri, 19 Aug 2022 at 14:35, Vincent Guittot
> > <vincent.guittot@xxxxxxxxxx> wrote:
> >>
> >> Hi Zhang,
> >>
> >> On Fri, 19 Aug 2022 at 12:54, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
> >>>
> >>>
> >>> On 2022/8/18 16:31, Vincent Guittot wrote:
> >>>> Le jeudi 18 août 2022 à 10:46:55 (+0800), Abel Wu a écrit :
> >>>>> On 8/17/22 8:58 PM, Vincent Guittot Wrote:
> >>>>>> On Tue, 16 Aug 2022 at 04:53, zhangsong (J) <zhangsong34@xxxxxxxxxx> wrote:
> >>>>>>>
> >>>> ...

...

> >>>
> >>> Thanks for your reply.
> >>> I have tried your patch and run test compared with it, it seems that the
> >>> patch you provide makes no sense.
> >>> The test result is below(1000 idle tasks bounds to CPU 0-1 and 10 normal
> >>> tasks bounds to CPU 1-2):
> >>>
> >>> =================================================================
> >>>
> >>> Without patch:
> >>>
> >>>
> >>> 6,777.37 msec cpu-clock # 1.355 CPUs utilized
> >>> 20,812 context-switches # 0.003 M/sec
> >>> 0 cpu-migrations # 0.000 K/sec
> >>> 0 page-faults # 0.000 K/sec
> >>> 13,333,983,148 cycles # 1.967 GHz
> >>> 6,457,930,305 instructions # 0.48 insn per cycle
> >>> 2,125,644,649 branches # 313.639 M/sec
> >>> 1,690,587 branch-misses # 0.08% of all
> >>> branches
> >>> 5.001931983 seconds time elapsed
> >>>
> >>> With your patch:
> >>>
> >>>
> >>> 6,791.46 msec cpu-clock # 1.358 CPUs utilized
> >>> 20,996 context-switches # 0.003 M/sec
> >>> 0 cpu-migrations # 0.000 K/sec
> >>> 0 page-faults # 0.000 K/sec
> >>> 13,467,573,052 cycles # 1.983 GHz
> >>> 6,516,989,062 instructions # 0.48 insn per cycle
> >>> 2,145,139,220 branches # 315.858 M/sec
> >>> 1,751,454 branch-misses # 0.08% of all
> >>> branches
> >>>
> >>> 5.002274267 seconds time elapsed
> >>>
> >>> With my patch:
> >>>
> >>>
> >>> 7,495.14 msec cpu-clock # 1.499 CPUs utilized
> >>> 23,176 context-switches # 0.003 M/sec
> >>> 309 cpu-migrations # 0.041 K/sec
> >>> 0 page-faults # 0.000 K/sec
> >>> 14,849,083,489 cycles # 1.981 GHz
> >>> 7,180,832,268 instructions # 0.48 insn per cycle
> >>> 2,363,300,644 branches # 315.311 M/sec
> >>> 1,964,169 branch-misses # 0.08% of all
> >>> branches
> >>>
> >>> 5.001713352 seconds time elapsed
> >>> ===============================================================
> >>>
> >>> Obviously, when your patch is applied, the cpu-migrations of normal
> >>> tasks is still 0 and the
> >>> CPU ulization of normal tasks have no improvement compared with no patch
> >>> applied.
> >>> When apply my patch, the cpu-migrations and CPU ulization of normal
> >>> tasks can both improve.
> >>> I cannot explain the result with your patch, you also can test it by
> >>> yourself.
> >>
> >> Do you have more details about the test that your are running ?
> >>
> >> Do cpu0-2 share their cache ?
> >> Which kingd of task are the normal and idle tasks ? always running tasks ?
> >>
> >> I'm going to try to reproduce your problem locally
> >
> > Some details of your UC are missing. I have tried to reproduce your
> > example above:
> > 1000 idle tasks bounds to CPU 0-1 and 10 normal tasks bounds to CPU 1-2
> >
> > Let assume that for any reason, the 10 normal tasks wake up on CPU 1.
> > Then, the thousand of idle tasks are moved to CPU0 by load balance and
> > only normal tasks stay on CPU1. Then load balance will move some
> > normal tasks to CPU2.
> >
> > My only way to reproduce something similar to your example, is to pin
> > the 1000 idle tasks on CPU1 so they can't move to CPU0. Then I can see
> > that load balance reaches loop_max limit and gets hard time moving
> > normal tasks on CPU2. But in this later case, my patch helps to move
> > normal tasks on CPU2. Something is missing in the description of your
> > UC.
> >
> > Sidenote, I have the same kind of problem with 1000 normal task with
> > low priority so it's not a matter of idle vs normal tasks
> >
> > Regards,
> > Vincent
> >
>
> Sorry for my slow reply.
>
> I have found a test case which can more illustrate this problem
> accurately. The test case is below.
>
> 1. a dead foreach loop process run as normal or idle task
> $ cat test.c
> int main(int argc, char **argv)
> {
> int i = 0;
> int duration = atoi(argv[1]);
>
> while(1) {
> usleep(duration);
> for(i = 0; i < 100000; i++) {}
> }
> }
> $ gcc -o test test.c
>
> 2. firstly spawn 500 idle tasks which bounds to CPU 0-2
> 3. secondly spawn 10 normal tasks which also bounds to CPU 0-2
> 4. lastly spawn 500 idle tasks which bounds to CPU 0 only
> 5. perf stat normal tasks and get CPU utilization and cpu-migrations
>
>
> Below is the whole test script.
> $ cat test.sh
> #!/bin/bash
>
> # create normal and idle cgroup path
> mkdir /sys/fs/cgroup/cpu/normal/
> mkdir /sys/fs/cgroup/cpu/idle/

so you put "idle" tasks in a task group and normal tasks in another
one. But both groups have default weight/share so you lose the idle
weight for idle tasks and each group will have half the cpus capacity,
i.e. 1.5 cpus per group. And this is what I have with the current
kernel, with your patch or with my patch.

In fact, there is enough tasks (510) not pinned to cpu0 to make the
system balance correctly

With this group hierarchy, the idle tasks will have the same weight
priority as normal tasks. Is it really what you want ?

tip kernel:
7 673,15 msec task-clock # 1,534 CPUs
utilized
12 662 context-switches # 1,650 K/sec
0 cpu-migrations # 0,000 /sec
5,003493176 seconds time elapsed

your patch:
7 488,35 msec task-clock # 1,497 CPUs
utilized
12 338 context-switches # 1,648 K/sec
3 cpu-migrations # 0,401 /sec
5,003406005 seconds time elapsed

my patch
7 569,57 msec task-clock # 1,513 CPUs
utilized
12 460 context-switches # 1,646 K/sec
0 cpu-migrations # 0,000 /sec
5,003437278 seconds time elapsed


>
> # spawn 500 idle tasks and bind tasks to CPU 0-2
> for ((i = 0; i < 500; i++))
> do
> taskset -c 0-2 ./test 200 &
> pid=$!
> # change to SCHED_IDLE policy
> chrt -i -p 0 $pid
> echo $pid > /sys/fs/cgroup/cpu/idle/tasks
> done
>
> # spawn 10 normal tasks and bind tasks to CPU 0-2
> normal_tasks=
> for ((i = 0; i < 10; i++))
> do
> taskset -c 0-2 ./test 500 &
> pid=$!
> normal_tasks+=$pid,
> echo $pid > /sys/fs/cgroup/cpu/normal/tasks
> done
>
> # spawn 500 idle tasks and bind tasks to CPU 0 only
> for ((i = 0; i < 500; i++))
> do
> taskset -c 0 ./test 200 &
> pid=$!
> # change to SCHED_IDLE policy
> chrt -i -p 0 $pid
> echo $pid > /sys/fs/cgroup/cpu/idle/tasks
> done
>
> # perf stat normal tasks
> perf stat -a -p $normal_tasks sleep 5
> pkill -f test
>
>
> You can try the above case and test it with your patch.
>
> Regards,
> Zhang Song