[PATCH] sched/uclamp: Fix iowait boost UCLAMP_MAX escape

From: Christian Loehle
Date: Tue Mar 26 2024 - 14:01:12 EST


A task, regardless of UCLAMP_MAX value, was previously allowed to
build up the sg_cpu->iowait boost up to SCHED_CAPACITY_SCALE when
enqueued. Since the boost was only uclamped when applied this led
to sugov iowait boosting the rq while the task is dequeued.

The fix introduced by
commit d37aee9018e6 ("sched/uclamp: Fix iowait boost escaping uclamp restriction")
added the uclamp check before the boost is applied. Unfortunately
that is insufficient, as the iowait_boost may be built up purely by
a task with UCLAMP_MAX task, but since this task is in_iowait often,
the clamps are no longer active during the in_iowait periods.
So another task (let's say with low utilization) may immediately
receive the iowait_boost value previously built up under UCLAMP_MAX
restrictions.

The issue is less prevalent than the above might suggest, since if
the dequeuing of the UCLAMP_MAX set task will turn the cpu idle the
previous UCLAMP_MAX value is preserved by uclamp_idle_value().
Nonetheless anything being enqueued on the rq during the in_iowait
phase will falsely receive the iowait_boost.

Can be observed with a basic single-threaded benchmark running with
UCLAMP_MAX of 0, the iowait_boost is then triggered by the occasional
kworker.

Fixes: 982d9cdc22c9 ("sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks")
Signed-off-by: Christian Loehle <christian.loehle@xxxxxxx>
---
kernel/sched/cpufreq_schedutil.c | 36 +++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index eece6244f9d2..bfd79762b28d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -205,6 +205,25 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost)
sg_cpu->util = sugov_effective_cpu_perf(sg_cpu->cpu, util, min, max);
}

+/**
+ * sugov_iowait_clamp() - Clamp the boost with UCLAMP_MAX
+ * @sg_cpu: the sugov data for the CPU
+ * @boost: the requested new boost
+ *
+ * Clamps the iowait boost according to the rq's UCLAMP_MAX restriction.
+ */
+static void sugov_iowait_clamp(struct sugov_cpu *sg_cpu, unsigned int boost)
+{
+#if CONFIG_UCLAMP_TASK
+ unsigned int boost_scaled = (boost *
+ arch_scale_cpu_capacity(sg_cpu->cpu)) >> SCHED_CAPACITY_SHIFT;
+
+ if (uclamp_rq_get(cpu_rq(sg_cpu->cpu), UCLAMP_MAX) < boost_scaled)
+ return;
+#endif
+ sg_cpu->iowait_boost = boost;
+ sg_cpu->iowait_boost_pending = true;
+}
/**
* sugov_iowait_reset() - Reset the IO boost status of a CPU.
* @sg_cpu: the sugov data for the CPU to boost
@@ -225,8 +244,8 @@ static bool sugov_iowait_reset(struct sugov_cpu *sg_cpu, u64 time,
if (delta_ns <= TICK_NSEC)
return false;

- sg_cpu->iowait_boost = set_iowait_boost ? IOWAIT_BOOST_MIN : 0;
- sg_cpu->iowait_boost_pending = set_iowait_boost;
+ if (set_iowait_boost)
+ sugov_iowait_clamp(sg_cpu, IOWAIT_BOOST_MIN);

return true;
}
@@ -249,6 +268,7 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
unsigned int flags)
{
bool set_iowait_boost = flags & SCHED_CPUFREQ_IOWAIT;
+ unsigned int iowait_boost;

/* Reset boost if the CPU appears to have been idle enough */
if (sg_cpu->iowait_boost &&
@@ -262,17 +282,17 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
/* Ensure boost doubles only one time at each request */
if (sg_cpu->iowait_boost_pending)
return;
- sg_cpu->iowait_boost_pending = true;

/* Double the boost at each request */
if (sg_cpu->iowait_boost) {
- sg_cpu->iowait_boost =
- min_t(unsigned int, sg_cpu->iowait_boost << 1, SCHED_CAPACITY_SCALE);
- return;
+ iowait_boost = min_t(unsigned int, sg_cpu->iowait_boost << 1,
+ SCHED_CAPACITY_SCALE);
+ } else {
+ /* First wakeup after IO: start with minimum boost */
+ iowait_boost = IOWAIT_BOOST_MIN;
}

- /* First wakeup after IO: start with minimum boost */
- sg_cpu->iowait_boost = IOWAIT_BOOST_MIN;
+ sugov_iowait_clamp(sg_cpu, iowait_boost);
}

/**
--
2.34.1