Re: [PATCH v3 0/4] sched/fair: Burstable CFS bandwidth controller

From: changhuaixin
Date: Tue Jan 26 2021 - 06:36:07 EST




> On Jan 21, 2021, at 7:04 PM, Huaixin Chang <changhuaixin@xxxxxxxxxxxxxxxxx> wrote:
>
> Changelog
>
> v3:
> 1. Fix another issue reported by test robot.
> 2. Update docs as Randy Dunlap suggested.
>
> v2:
> 1. Fix an issue reported by test robot.
> 2. Rewriting docs. Appreciate any further suggestions or help.
>
> The CFS bandwidth controller limits CPU requests of a task group to
> quota during each period. However, parallel workloads might be bursty
> so that they get throttled. And they are latency sensitive at the same
> time so that throttling them is undesired.
>
> Scaling up period and quota allows greater burst capacity. But it might
> cause longer stuck till next refill. We introduce "burst" to allow
> accumulating unused quota from previous periods, and to be assigned when
> a task group requests more CPU than quota during a specific period. Thus
> allowing CPU time requests as long as the average requested CPU time is
> below quota on the long run. The maximum accumulation is capped by burst
> and is set 0 by default, thus the traditional behaviour remains.
>
> A huge drop of 99th tail latency from more than 500ms to 27ms is seen for
> real java workloads when using burst. Similar drops are seen when
> testing with schbench too:
>
> echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs
> echo 700000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
> echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us
> echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us
>
> # The average CPU usage is around 500%, which is 200ms CPU time
> # every 40ms.
> ./schbench -m 1 -t 30 -r 60 -c 10000 -R 500
>
> Without burst:
>
> Latency percentiles (usec)
> 50.0000th: 7
> 75.0000th: 8
> 90.0000th: 9
> 95.0000th: 10
> *99.0000th: 933
> 99.5000th: 981
> 99.9000th: 3068
> min=0, max=20054
> rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33%
>
> With burst:
>
> Latency percentiles (usec)
> 50.0000th: 7
> 75.0000th: 8
> 90.0000th: 9
> 95.0000th: 9
> *99.0000th: 12
> 99.5000th: 13
> 99.9000th: 19
> min=0, max=406
> rps: 498.36 p95 (usec) 9 p99 (usec) 12 p95/cputime 0.09% p99/cputime 0.12%
>
> How much workloads with benefit from burstable CFS bandwidth control
> depends on how bursty and how latency sensitive they are.
>
> Previously, Cong Wang and Konstantin Khlebnikov proposed similar
> feature:
> https://lore.kernel.org/lkml/20180522062017.5193-1-xiyou.wangcong@xxxxxxxxx/
> https://lore.kernel.org/lkml/157476581065.5793.4518979877345136813.stgit@buzz/
>
> This time we present more latency statistics and handle overflow while
> accumulating.
>
> Huaixin Chang (4):
> sched/fair: Introduce primitives for CFS bandwidth burst
> sched/fair: Make CFS bandwidth controller burstable
> sched/fair: Add cfs bandwidth burst statistics
> sched/fair: Add document for burstable CFS bandwidth control
>
> Documentation/scheduler/sched-bwc.rst | 49 +++++++++++--
> include/linux/sched/sysctl.h | 2 +
> kernel/sched/core.c | 126 +++++++++++++++++++++++++++++-----
> kernel/sched/fair.c | 58 +++++++++++++---
> kernel/sched/sched.h | 9 ++-
> kernel/sysctl.c | 18 +++++
> 6 files changed, 232 insertions(+), 30 deletions(-)
>
> --
> 2.14.4.44.g2045bb6

Ping, any new comments on this patchset? If there're no other concerns, I think it's ready to be merged?