Re: [PATCH 1/2] sched/schedutil: rework performance estimation

From: Lukasz Luba
Date: Wed Oct 18 2023 - 08:19:35 EST




On 10/18/23 08:04, Beata Michalska wrote:
Hi Vincent,

On Fri, Oct 13, 2023 at 05:14:49PM +0200, Vincent Guittot wrote:
The current method to take into account uclamp hints when estimating the
target frequency can end into situation where the selected target
frequency is finally higher than uclamp hints whereas there are no real
needs. Such cases mainly happen because we are currently mixing the
traditional scheduler utilization signal with the uclamp performance
hints. By adding these 2 metrics, we loose an important information when
it comes to select the target frequency and we have to make some
assumptions which can't fit all cases.

[snip]


diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index b9caa01dfac4..adec808b371a 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -243,7 +243,6 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
scale_cpu = arch_scale_cpu_capacity(cpu);
ps = &pd->table[pd->nr_perf_states - 1];
- max_util = map_util_perf(max_util);
Even though the effective_cpu_util does no longer include the headroom, it is
being applied by sugov further down the line (sugov_effective_cpu_perf).
Won't that bring back the original problem when freq selection within EM is
not align with the one performed by sugov ?

It should be OK here to remove the above line. The map_util_perf()
is done before this em_cpu_energy() call, in the new code in


[snip]

}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 922905194c0c..d4f7b2f49c44 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7628,7 +7628,7 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
for_each_cpu(cpu, pd_cpus) {
unsigned long util = cpu_util(cpu, p, -1, 0);
- busy_time += effective_cpu_util(cpu, util, ENERGY_UTIL, NULL);
+ busy_time += effective_cpu_util(cpu, util, NULL, NULL);
}
eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
@@ -7651,7 +7651,7 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
for_each_cpu(cpu, pd_cpus) {
struct task_struct *tsk = (cpu == dst_cpu) ? p : NULL;
unsigned long util = cpu_util(cpu, p, dst_cpu, 1);
- unsigned long eff_util;
+ unsigned long eff_util, min, max;
/*
* Performance domain frequency: utilization clamping
@@ -7660,7 +7660,23 @@ eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
* NOTE: in case RT tasks are running, by default the
* FREQUENCY_UTIL's utilization can be max OPP.
*/
- eff_util = effective_cpu_util(cpu, util, FREQUENCY_UTIL, tsk);
+ eff_util = effective_cpu_util(cpu, util, &min, &max);
+
+ /* Task's uclamp can modify min and max value */
+ if (tsk && uclamp_is_used()) {
+ min = max(min, uclamp_eff_value(p, UCLAMP_MIN));
+
+ /*
+ * If there is no active max uclamp constraint,
+ * directly use task's one otherwise keep max
+ */
+ if (uclamp_rq_is_idle(cpu_rq(cpu)))
+ max = uclamp_eff_value(p, UCLAMP_MAX);
+ else
+ max = max(max, uclamp_eff_value(p, UCLAMP_MAX));
+ }
+
+ eff_util = sugov_effective_cpu_perf(cpu, eff_util, min, max);
This will include the headroom so won't it inflate the util here ?

Yes, that's the goal. It will inflate when needed. Currently, the
problem is that we always inflate (blindly) in the em_cpu_energy().
We don't know if the util value which is comming is from uclamp_max
and the frequency should not be higher, because something want to
clamp it.

The other question would be:
What if the PD has 4 CPUs, the max util found is 500 and is from
uclamp_max, but there is onother util on some other CPU 490?

That CPU is allowed and can have the +20% freq in voting.
In current design we don't punish the whole domain in such
scenario.