Re: [PATCH] sched/rt: Add a new sysctl to control uclamp_util_min

From: Valentin Schneider
Date: Wed Jan 08 2020 - 20:35:13 EST


On 08/01/2020 18:56, Patrick Bellasi wrote:
> Here you are force setting the task-specific _requests_ to match the
> system-wide _constraints_. This is not required and it's also
> conceptually wrong, since you mix two concepts: requests and
> constraints.
>
> System-default values must never be synchronized with task-specific
> values. This allows to always satisfy task _requests_ when not
> conflicting with system-wide (or task-group) _constraints_.
>
> For example, assuming we have a task with util_min=500 and we keep
> changing the system-wide constraints, we would like the following
> effective clamps to be enforced:
>
> time | system-wide | task-specific | effective clamp
> -----+-------------+---------------+-----------------
> t0 | 1024 | 500 | 500
> t1 | 0 | 500 | 0
> t2 | 200 | 500 | 200
> t3 | 600 | 500 | 500
>
> If the taks should then change it's requested util_min:
>
> time | system-wide | task-specific | effective clamp
> -----+-------------+---------------+----------------
> t4 | 600 | 800 | 600
> t6 | 1024 | 800 | 800
>
> If you force set the task-specific requests to match the system-wide
> constraints, you cannot get the above described behaviors since you
> keep overwriting the task _requests_ with system-wide _constraints_.
>

But is what Qais' proposing really a system-wide *constraint*? What we want
to do here is have a knob for the RT uclamp.min values, because gotomax isn't
viable (for mobile, you know the story!). This leaves user_defined values
alone, so you should be able to reproduce exactly what you described above.
If I take your t3 and t4 examples:

| time | system-wide | rt default | task-specific | user_defined | effective |
|------+-------------+------------+---------------+--------------+-----------|
| t3 | 600 | 1024 | 500 | Y | 500 |
| t4 | 600 | 1024 | 800 | Y | 600 |

If the values were *not* user-defined, then it would depend on the default
knob Qais is introducing:

| time | system-wide | rt default | task-specific | user_defined | effective |
|------+-------------+------------+---------------+--------------+-----------|
| t3 | 600 | 1024 | 1024 | N | 600 |
| t4 | 600 | 0 | 0 | N | 0 |

It's not forcing the task-specific value to the system-wide RT value, it's
just using it as tweakable default. At least that's how I understand it,
did I miss something?

> Thus, requests and contraints must always float independently and
> used to compute the effective clamp at task wakeup time via:
>
> enqueue_task(rq, p, flags)
> uclamp_rq_inc(rq, p)
> uclamp_rq_inc_id(rq, p, clamp_id)
> uclamp_eff_get(p, clamp_id)
> uclamp_tg_restrict(p, clamp_id)
> p->sched_class->enqueue_task(rq, p, flags)
>
> where the task-specific request is restricted considering its task group
> effective value (the constraint).
>
> Do note that the root task group effective value (for cfs) tasks is kept
> in sync with the system default value and propagated down to the
> effective value of all subgroups.
>
> Do note also that the effective value is computed before calling into
> the scheduling class's enqueue_task(). Which means that we have the
> right value in place before we poke sugov.
>
> Thus, a proper implementation of what you need should just
> replicate/generalize what we already do for cfs tasks.
>

Reading

7274a5c1bbec ("sched/uclamp: Propagate system defaults to the root group")

I see "The clamp values are not tunable at the level of the root task group".
This means that, for mobile systems where we want a default uclamp.min of 0
for RT tasks, we would need to create a cgroup for all RT tasks (and tweak
its uclamp.min, but from playing around a bit I see that defaults to 0).

(Would we need CONFIG_RT_GROUP_SCHED for this? IIRC there's a few pain points
when turning it on, but I think we don't have to if we just want things like
uclamp value propagation?)

It's quite more work than the simple thing Qais is introducing (and on both
user and kernel side).