Re: [RFC PATCH v2 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY feature (v2)

From: Mathieu Desnoyers
Date: Mon Oct 23 2023 - 11:04:48 EST


On 2023-10-23 10:11, Dietmar Eggemann wrote:
On 19/10/2023 18:05, Mathieu Desnoyers wrote:

[...]
+static unsigned long scale_rt_capacity(int cpu);
+
+/*
+ * Returns true if adding the task utilization to the estimated
+ * utilization of the runnable tasks on @cpu does not exceed the
+ * capacity of @cpu.
+ *
+ * This considers only the utilization of _runnable_ tasks on the @cpu
+ * runqueue, excluding blocked and sleeping tasks. This is achieved by
+ * using the runqueue util_est.enqueued.
+ */
+static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util,
+ int cpu)

This is almost like the existing task_fits_cpu(p, cpu) (used in Capacity
Aware Scheduling (CAS) for Asymmetric CPU capacity systems) except the
latter only uses `util = task_util_est(p)` and deals with uclamp as well
and only tests whether p could fit on the CPU.

This is indeed a major difference between how asym capacity check works and what is introduced here:

asym capacity check only checks whether the given task theoretically fits in the cpu if that cpu was completely idle, without considering the current cpu utilization.

My approach is to consider the current util_est of the cpu to check whether the task fits in the remaining capacity.

I did not want to use the existing task_fits_cpu() helper because the notions of uclamp bounds appear to be heavily tied to the fact that it checks whether the task fits in an _idle_ runqueue, whereas the check I am introducing here is much more restrictive: it checks that the task fits on the runqueue within the remaining capacity.


Or like find_energy_efficient_cpu() (feec(), used in
Energy-Aware-Scheduling (EAS)) which uses cpu_util(cpu, p, cpu, 0) to get:

max(util_avg(CPU + p), util_est(CPU + p))

I've tried using cpu_util(), but unfortunately anything that considers blocked/sleeping tasks in its utilization total does not work for my use-case.

From cpu_util():

* CPU utilization is the sum of running time of runnable tasks plus the
* recent utilization of currently non-runnable tasks on that CPU.


feec()
...
for (; pd; pd = pd->next)
...
util = cpu_util(cpu, p, cpu, 0);
...
fits = util_fits_cpu(util, util_min, util_max, cpu)
^^^^^^^^^^^^^^^^^^
not used when uclamp is not active (1)
...
capacity = capacity_of(cpu)
fits = fits_capacity(util, capacity)
if (!uclamp_is_used()) (1)
return fits

So not introducing new functions like task_fits_remaining_cpu_capacity()
in this area and using existing one would be good.

If the notion of uclamp is not tied to the way asym capacity check is done against a theoretically idle runqueue, I'd be OK with using this, but so far both appear to be very much tied.

When I stumbled on this fundamental difference between asym cpu capacity check and the check introduced here, I've started wondering whether the asym cpu capacity check would benefit from considering the target cpu current utilization as well.


+{
+ unsigned long total_util;
+
+ if (!sched_util_fits_capacity_active())
+ return false;
+ total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) + task_util;
+ return fits_capacity(total_util, scale_rt_capacity(cpu));

Why not use:

static unsigned long capacity_of(int cpu)
return cpu_rq(cpu)->cpu_capacity;

which is maintained in update_cpu_capacity() as scale_rt_capacity(cpu)?

The reason for preferring scale_rt_capacity(cpu) over capacity_of(cpu) is that update_cpu_capacity() only runs periodically every balance-interval, therefore providing a coarse-grained remaining capacity approximation with respect to irq, rt, dl, and thermal utilization.

If it turns out that being coarse-grained is good enough, we may be able to save some cycles by using capacity_of(), but not without carefully considering the impacts of being imprecise.


[...]

@@ -7173,7 +7200,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if (recent_used_cpu != prev &&
recent_used_cpu != target &&
cpus_share_cache(recent_used_cpu, target) &&
- (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
+ (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu) ||
+ task_fits_remaining_cpu_capacity(task_util, recent_used_cpu)) &&
cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) &&
asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) {
return recent_used_cpu;
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..9a84a1401123 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -97,6 +97,12 @@ SCHED_FEAT(WA_BIAS, true)
SCHED_FEAT(UTIL_EST, true)
SCHED_FEAT(UTIL_EST_FASTUP, true)

IMHO, asymmetric CPU capacity systems would have to disable the sched
feature UTIL_FITS_CAPACITY. Otherwise CAS could deliver different
results. task_fits_remaining_cpu_capacity() and asym_fits_cpu() work
slightly different.

I don't think they should be mutually exclusive. We should look into the differences between those two more closely to make them work nicely together instead. For instance, why does asym capacity only consider whether tasks fit in a theoretically idle runqueue, when it could use the current utilization of the runqueue to check that the task fits in the remaining capacity ?

Unfortunately I don't have a machine with asym cpu to test locally.

Thanks for your feedback !

Mathieu



[...]


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com