Re: [PATCH 1/7] sched/uclamp: Fix relationship between uclamp and migration margin

From: Qais Yousef
Date: Thu Aug 04 2022 - 10:59:26 EST


On 07/22/22 17:13, Vincent Guittot wrote:

[...]

> Using capacity_orig_of(cpu) - thermal_load_avg(rq_of(cpu)) seems like
> a simple solution to cover thermal mitigation
>
> Also I was looking more deeply at your condition and get hard time to
> understand why uclamp_max_fits needs to be false when both
> (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE) ?
>
> + max_capacity = (capacity_orig == SCHED_CAPACITY_SCALE) &&
> (uclamp_max == SCHED_CAPACITY_SCALE);
> + uclamp_max_fits = !max_capacity && (uclamp_max <= capacity_orig);
> + fits = fits || uclamp_max_fits;
>
> For task I would have done only :
>
> + capacity_orig = capacity_orig_of(cpu) - thermal_load_avg(rq_of(cpu));
> + uclamp_max_fits = (uclamp_max <= capacity_orig);
> fits = fits || uclamp_max_fits;

I just sent v2, and it's good to clarify what I have considered so far:

uclamp_max shouldn't care about thermal pressure except for capacity inversion
case. The goal of uclamp_max is to cap the task and the weak affinity part of
the hint is important to honour. So transient thermal pressure is not a problem
from fitness point of view. uclamp_max means it shouldn't exceed this perf
level, it's okay to be capped at a less value.

And ignoring the max_capacity check for tasks will actually create problems
because feec() will wrongly force fit tasks on the biggest cores only for
overutilized state to trigger later.

To preserve the current behavior, feec() should bailout and let the other logic
in select_task_rq_fair() fallback to the next best thing.

To do that, we need both call sites to behave the same.

>
> and I would use a different one for cpu_overutlized in orde to discard the test
> with uclamp_max if uclamp_max one equals SCHED_CAPACITY_SCALE
>
> + uclamp_max_fits = (uclamp_max <= capacity_orig) && (uclamp_max != SCHED_CAPACITY_SCALE);

I opted to keep the logic encapsulated in util_fits_cpu(). I was wary that not
having coherent logic for all call sites will lead to random behavior changes.
Especially in the wake up path.

> and I don't think that we should compare uclamp_min <= capacity_orig for
> cpu_overutlized() but only for task to detect misfit one because uclamp_min is
> a performance hint not a bandwidth as you said previously.

I'd agree only for the corner case where capacity_orig == SCHED_CAPACITY_SCALE.

But for others it actually defeats the purpose of uclamp_min. If the user
dynamically controls uclamp_min (there are already users in android), then we
should detect if we need to migrate the task to a bigger CPU at the tick,
otherwise the new uclamp_min will only be honoured on the next wake up.

This doesn't contradict the performance hint nature of uclamp. If it requests
a uclamp_min = 1024 for example but it's already running on a little or medium
CPU, then by not triggering a misfit migration we prevent the task from
obtaining the performance level it asked for until the next wake up. Which
might end up being too late and impact the user experience already.


Thanks!

--
Qais Yousef