Re: [PATCH 3/3] sched/fair: Consider uclamp for "task fits capacity" checks

From: Valentin Schneider
Date: Thu Nov 21 2019 - 12:22:22 EST


On 21/11/2019 15:30, Quentin Perret wrote:
>> uclamp_util_with() (or uclamp_rq_util_with() ;)) picks the max between the
>> rq-aggregated clamps and the task clamps, which isn't what we want. If the
>> task has a low-ish uclamp.max (e.g. the 300 example from above) but the
>> rq-wide max-aggregated uclamp.max is ~800, we'd clamp using that 800. It
>> makes sense for frequency selection, but not for task placement IMO.
>
> Right, but you could argue that this is in fact a correct behaviour.
> What we want to know is 'is this CPU big enough to meet the capacity
> request if I enqueue p there ?'. And the 'capacity request' is the
> aggregated rq-wide clamped util, IMO.
>
> If enqueuing 'p' on a given CPU will cause the rq-wide clamped util to
> go above the CPU capacity, we want to skip that CPU.
>
> The obvious case is if p's min_clamp is larger than the CPU capacity.

Right, so far we get that with task_fits_capacity().

>
> But similarly, if p's max_clamp is going to be ignored because of
> another task with a larger max_clamp on the same rq, this is relevant
> information too -- the resulting capacity request might be above the
> CPU capacity if p's util_avg is large, so we should probably skip the
> CPU too no ?
>

Hmph thinking is hard but I think I agree with you.


Say you have

rq.cpu_capacity_orig = 1024
rq.util_avg = 300
rq.uclamp.max = 600

p.util_est = 600
p.uclamp.max = 300

If we enqueue p on that rq, we shouldn't go above 600 util (or something
close, depending on the frequency this lets us select). But, AFAICT,
cpu_util_next() will see this as going to 900 util and we'll thus skip this
CPU for this task (because that would make us overutilized). With your
suggested change, we wouldn't skip this CPU. Plus this is what we end up
using in compute_energy(), so this would keep both ends aligned.


I think we have a similar problem for downmigration (with the current patch),
say you have:

rq.cpu_capacity_orig = 462
rq.util_avg = 50
rq.uclamp.max = 100

p.util_est = 512
p.uclamp.max = 200

In this case I think we should get 200 cpu util, but we first do

/* Skip CPUs that will be overutilized. */
util = cpu_util_next(cpu, p, cpu);
cpu_cap = capacity_of(cpu);
if (!fits_capacity(util, cpu_cap))
continue;

which *doesn't* look at the clamps, so we would see ~562 util which doesn't
fit on that small CPU, so we'd skip it. With your approach we would correctly
clamp this to 200 and carry on.


One thing I'd like to point out is if we have tasks with default clamp values
(.min=0, .max=1024) enqueued, we won't "throttle" tasks with low uclamp.max.
So something to keep in mind.

One last thing: this makes CPU selection slightly different from what
wake_cap() will do (that latter only uses uclamp_task_util()), but I think
that is fine.

> Are we gaining anything if we decide to not align the EAS path and the
> sugov path ?
>
> Thanks,
> Quentin
>