Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

From: Viresh Kumar
Date: Wed Oct 28 2020 - 18:15:37 EST


On 28-10-20, 19:03, zhuguangqing83 wrote:
> Thanks for your comments. Maybe my description was not clear before.
>
> If I understand correctly, when policy->min/max get changed in the time
> Window between get_next_freq() and sugov_fast_switch(), to be more
> precise between cpufreq_driver_resolve_freq() and
> cpufreq_driver_fast_switch(), the issue may happen.
>
> For example, the first time schedutil callback gets called from the
> scheduler, we reached get_next_freq() and calculate the next_freq,
> suppose next_freq is 1.0 GHz, then sg_policy->next_freq is updated
> to 1.0 GHz in sugov_update_next_freq(). If policy->min/max get
> change right now, suppose policy->min is changed to 1.2 GHz,
> then the final next_freq is 1.2 GHz for there is another clamp
> between policy->min and policy->max in cpufreq_driver_fast_switch().
> Then sg_policy->next_freq(1.0 GHz) is not the final next_freq(1.2 GHz).
>
> The second time schedutil callback gets called from the scheduler, there
> are two issues:
> (1) Suppose policy->min is still 1.2 GHz, we reached get_next_freq() and
> calculate the next_freq, because sg_policy->limits_changed gets set to
> true by sugov_limits() and there is a clamp between policy->min and
> policy->max, so this time next_freq will be greater than or equal to 1.2
> GHz, suppose next_freq is also 1.2 GHz. Now next_freq is 1.2 GHz and
> sg_policy->next_freq is 1.0 GHz, then we find
> "if (sg_policy->next_freq == next_freq)" is not satisfied and we call
> cpufreq driver to change the cpufreq to 1.2 GHz. Actually it's already
> 1.2 GHz, it's not necessary to change this time.

This isn't that bad, but ...

> (2) Suppose policy->min was changed again to 1.0 GHz before, we reached
> get_next_freq() and calculate the next_freq, suppose next_freq is also
> 1.0 GHz. Now next_freq is 1.0 GHz and sg_policy->next_freq is also 1.0 GHz,
> then we find "if (sg_policy->next_freq == next_freq)" is satisfied and we
> don't change the cpufreq. Actually we should change the cpufreq to 1.0 GHz
> this time.

This is a real problem we can get into. What about this diff instead ?

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 0c5c61a095f6..bf7800e853d3 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
if (sg_policy->next_freq == next_freq)
return false;

- sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;

return true;
@@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
unsigned int next_freq)
{
if (sugov_update_next_freq(sg_policy, time, next_freq))
- cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
+ sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
}

static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
@@ -124,6 +123,7 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
if (!sugov_update_next_freq(sg_policy, time, next_freq))
return;

+ sg_policy->next_freq = next_freq;
if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(&sg_policy->irq_work);

--
viresh