Re: [PATCH v2 1/1] sched/uclamp: add SCHED_FLAG_UTIL_CLAMP_RESET flag to reset uclamp

From: Yun Hsiang
Date: Wed Oct 14 2020 - 11:00:44 EST


On Tue, Oct 13, 2020 at 10:25:48PM +0200, Dietmar Eggemann wrote:
> Hi Yun,
>
> On 12/10/2020 18:31, Yun Hsiang wrote:
> > If the user wants to stop controlling uclamp and let the task inherit
> > the value from the group, we need a method to reset.
> >
> > Add SCHED_FLAG_UTIL_CLAMP_RESET flag to allow the user to reset uclamp via
> > sched_setattr syscall.
>
> before we decide on how to implement the 'uclamp user_defined reset'
> feature, could we come back to your use case in
> https://lkml.kernel.org/r/20201002053812.GA176142@ubuntu ?
>
> Lets just consider uclamp min for now. We have:
>
> (1) system-wide:
>
> # cat /proc/sys/kernel/sched_util_clamp_min
>
> 1024
>
> (2) tg (hierarchy) with top-app's cpu.uclamp.min to ~200 (20% of 1024):
>
> # cat /sys/fs/cgroup/cpu/top-app/cpu.uclamp.min
> 20
>
> (3) and 2 cfs tasks A and B in top-app:
>
> # cat /sys/fs/cgroup/cpu/top-app/tasks
>
> pid_A
> pid_B
>
> Then you set A and B's uclamp min to 100. A and B are now user_defined.
> A and B's effective uclamp min value is 100.
>
> Since the task uclamp min values (3) are less than (1) and (2), their
> uclamp min value is not affected by (1) or (2).
>
> If A doesn't want to control itself anymore, it can set its uclamp min
> to e.g. 300. Now A's effective uclamp min value is ~200, i.e. controlled
> by (2), the one of B stays 100.

The tg uclamp value may also change. If top-app's cpu.uclamp.min change
to 50 (~500), then task A's effective uclamp min value is 300 not ~500.
We can set task A's uclamp to 1024, it will be restricted by the tg.
But when task A move to root group, it's effective uclamp min value
will be 1024 not 0. If a task is in root group and it doesn't want to
control it's uclamp, the effective uclamp min value of that task should be 0.
So I think reset functionality is needed.

>
> So the policy is:
>
> (a) If the user_defined task wants to control it's uclamp, use task
> uclamp value less than the tg (hierarchy) (and the system-wide)
> value.
>
> (b) If the user_defined task doesn't want to control it's uclamp
> anymore, use a uclamp value greater than or equal the tg (hierarchy)
> (and the system-wide) value.
>
> So where exactly is the use case which would require a 'uclamp
> user_defined reset' functionality?