Re: [PATCH v2 1/1] sched/uclamp: add SCHED_FLAG_UTIL_CLAMP_RESET flag to reset uclamp

From: Patrick Bellasi
Date: Wed Oct 14 2020 - 10:51:38 EST



On Tue, Oct 13, 2020 at 22:25:48 +0200, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote...

Hi Dietmar,

> Hi Yun,
>
> On 12/10/2020 18:31, Yun Hsiang wrote:
>> If the user wants to stop controlling uclamp and let the task inherit
>> the value from the group, we need a method to reset.
>>
>> Add SCHED_FLAG_UTIL_CLAMP_RESET flag to allow the user to reset uclamp via
>> sched_setattr syscall.
>
> before we decide on how to implement the 'uclamp user_defined reset'
> feature, could we come back to your use case in
> https://lkml.kernel.org/r/20201002053812.GA176142@ubuntu ?
>
> Lets just consider uclamp min for now. We have:
>
> (1) system-wide:
>
> # cat /proc/sys/kernel/sched_util_clamp_min
>
> 1024
>
> (2) tg (hierarchy) with top-app's cpu.uclamp.min to ~200 (20% of 1024):
>
> # cat /sys/fs/cgroup/cpu/top-app/cpu.uclamp.min
> 20
>
> (3) and 2 cfs tasks A and B in top-app:
>
> # cat /sys/fs/cgroup/cpu/top-app/tasks
>
> pid_A
> pid_B
>
> Then you set A and B's uclamp min to 100. A and B are now user_defined.
> A and B's effective uclamp min value is 100.
>
> Since the task uclamp min values (3) are less than (1) and (2), their
> uclamp min value is not affected by (1) or (2).
>
> If A doesn't want to control itself anymore, it can set its uclamp min
> to e.g. 300. Now A's effective uclamp min value is ~200, i.e. controlled
> by (2), the one of B stays 100.
>
> So the policy is:
>
> (a) If the user_defined task wants to control it's uclamp, use task
> uclamp value less than the tg (hierarchy) (and the system-wide)
> value.
>
> (b) If the user_defined task doesn't want to control it's uclamp
> anymore, use a uclamp value greater than or equal the tg (hierarchy)
> (and the system-wide) value.
>
> So where exactly is the use case which would require a 'uclamp
> user_defined reset' functionality?

Not sure what's the specific use-case Yun is after, but I have at least
one in my mind.

Let say a task does not need boost at all, independently from
the cgroup it's configured to run into. We can go on and set its task
specific value to util_min=0.

In this case, when the task is running alone on a CPU, it will get
always the minimum OPP, independently from its utilization.

Now, after a while (e.g. some special event happens) we want to relax
this constraint and allow the task to run:
1. at whatever OPP is required by its utilization
2. with any additional boost possibly enforced by its cgroup

Right now we have only quite cumbersome or hack solution:
a) go check the current cgroup util_min value and set for the task
something higher than that
b) set task::util_min=1024 thus asking for the maximum possible boost

Solution a) is more code for userspace and it's also racy. Solution b)
is misleading since the task does not really want to run at 1024.
It's also potentially over-killing in case the task should be moved to
the root group, which is normally unbounded and thus the task will get
executed always at the max OPP without any specific reason why.

A simple _UCLAMP_RESET flag will allow user-space to easily switch a
tasks to the default behavior (follow utilization or recommended
boosts) which is what a task usually gets when it does not opt-in to
uclamp.

Looking forward to see if Yun has an even more specific use-case.