high latency in CFS when disable autogroup, bug or not?

From: Li RongQing
Date: Thu Aug 31 2017 - 23:28:52 EST


I have a programe to test CFS latency or fairness on high load,
the test shows CFS has high latency when disable autogroup

the program is in
https://gist.github.com/anonymous/af21ae289cfa5c6310eeac73b7a478ff.
the programe creates many threads, every thread does thing like:

do {
now =gettime()
loop to spend 10 microseconds
delta = gettime()-now

if (delta > max_latency) {
max_latency=delta
printf( max_latency);
}
} while(1)


I run this program on a machine with 48 processor, this programe
created 2000 thread,
then every processor has about 40 pthread, every thread should finish
its 10ms computation in 400ms= 40pthread*10ms
but test result show some thread takes very long time

# ./pipe_test -n 2000
eat cpu with 2000 threads, delay time 1000 ms
648 delay 1037 ms
48923 delay 1038 ms
1810 delay 1349 ms
49142 delay 1482 ms
1728 delay 1574 ms
1518 delay 1713 ms
808 delay 1714 ms
1702 delay 1733 ms
49004 delay 1783 ms
48821 delay 1785 ms
451 delay 1865 ms
990 delay 1910 ms
1626 delay 1957 ms
537 delay 2420 ms
2021 delay 3242 ms
763 delay 3488 ms
622 delay 3614 ms
1887 delay 4474 ms
1267 delay 4924 ms
721 delay 5406 ms
1810 delay 5873 ms
1470 delay 5969 ms
1702 delay 7834 ms
48821 delay 8063 ms
^C

The kernel version is 4.9.23 and I disable autogroup; if autogroup
is enabled, no this issue


ftrace result:

<...>-48821 [003] d... 64151.635476: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R ==>
next_comm=baas_agent next_pid=17118 next_prio=120
<...>-48821 [003] d... 64151.636531: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R ==>
next_comm=baas_agent next_pid=17118 next_prio=120
<...>-48821 [003] d... 64151.639570: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R ==>
next_comm=baas_agent next_pid=17275 next_prio=120
<...>-48821 [003] d... 64159.703051: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R+ ==>
next_comm=kworker/u97:0 next_pid=36929 next_prio=120
<...>-48821 [003] d... 64159.703091: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R ==>
next_comm=kworker/u97:0 next_pid=36929 next_prio=120
<...>-48821 [003] d... 64159.703978: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R ==>
next_comm=baas_agent next_pid=17275 next_prio=120
<...>-48821 [003] d... 64159.705542: sched_switch:
prev_comm=a.out prev_pid=48821 prev_prio=120 prev_state=R ==>
next_comm=baas_agent next_pid=16879 next_prio=120


# grep sched_migrate_task trace|grep 48821
<...>-688 [019] d... 64161.828654: sched_migrate_task:
comm=a.out pid=48821 prio=120 orig_cpu=3 dest_cpu=19
<...>-48821 [019] d... 64161.828862: sched_migrate_task:
comm=a.out pid=49053 prio=120 orig_cpu=43 dest_cpu=19
#