Re: [PATCH] sched/fair: make CFS bandwidth slice per cpu group

From: Cong Wang
Date: Mon Apr 30 2018 - 16:37:42 EST


On Mon, Apr 30, 2018 at 12:42 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Mon, Apr 30, 2018 at 12:29:25PM -0700, Cong Wang wrote:
>> Currently, the sched_cfs_bandwidth_slice_us is a global setting which
>> affects all cgroups. Different groups may want different values based
>> on their own workload, one size doesn't fit all. The global pool filled
>> periodically is per cgroup too, they should have the right to distribute
>> their own quota to each local CPU with their own frequency.
>
> Why.. what happens? This doesn't really tell us anything.

We saw tasks in a container got throttled for many times even
when they don't apparently over-burn the CPU's. I tried to reduce
the sched_cfs_bandwidth_slice_us from the default 5ms to 1ms,
it solved the problem as no tasks got throttled after this change.
This is why I want to change it.

And I don't think 1ms will be good for all containers, so in order to
minimize the impact, I would like to keep the slice change within
each container. This is why I propose this patch rather just
`sysctl -w`. Do you think otherwise?

BTW, people reported a similar (if not same) issue here before:
https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1

Thanks!