Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

From: zhuguangqing83
Date: Thu Oct 29 2020 - 07:17:22 EST


On 10/29/2020 15:19,Viresh Kumar<viresh.kumar@xxxxxxxxxx> wrote:
> Your mail client is screwing the "In-reply-to" field of the message
> and that prevents it to appear properly in the thread in mailboxes of
> other people, please fix that.
>

I will try to fix that.

> On 29-10-20, 09:43, zhuguangqing83 wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 0c5c61a095f6..bf7800e853d3 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > if (sg_policy->next_freq == next_freq)
> > > return false;
> > >
> > > - sg_policy->next_freq = next_freq;
> > > sg_policy->last_freq_update_time = time;
> > >
> > > return true;
> >
> > It's a little strange that sg_policy->next_freq and
> > sg_policy->last_freq_update_time are not updated at the same time.
> >
> > > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > if (sugov_update_next_freq(sg_policy, time, next_freq))
> > > - cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > + sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > }
> > >
> >
> > Great, it also takes into account the issue that 0 is returned by the
> > driver's ->fast_switch() callback to indicate an error condition.
>
> Yes but even my change wasn't good enough, more on it later.
>
> > For policy->min/max may be not the real CPU frequency in OPPs, so
>
> No, that can't happen. If userspace tries to set a value too large or
> too small, we clamp that too to policy->max/min and so the below
> problem shall never occur.
>
> > next_freq got from get_next_freq() which is after clamping between
> > policy->min and policy->max may be not the real CPU frequency in OPPs.
> > In that case, if we use a real CPU frequency in OPPs returned from
> > cpufreq_driver_fast_switch() to compare with next_freq,
> > "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> > change the CPU frequency every time schedutil callback gets called from
> > the scheduler. I see the current code in get_next_freq() as following,
> > the issue mentioned above should not happen. Maybe it's just one of my
> > unnecessary worries.
>
> Coming back to my patch (and yours too), it only fixes the fast-switch
> case and not the slow path which can also end up clamping the
> frequency. And to be honest, even the drivers can have their own
> clamping code in place, no one is stopping them too.
>
> And we also need to do something about the cached_raw_freq as well, as
> it will not be in sync with next_freq anymore.
>
> Here is another attempt from me tackling this problem. The idea is to
> check if the previous freq update went as expected or not. And if not,
> we can't rely on next_freq or cached_raw_freq anymore. For this to
> work properly, we need to make sure policy->cur isn't getting updated
> at the same time when get_next_freq() is running. For that I have
> given a different meaning to work_in_progress flag, which now creates
> a lockless (kind of) critical section where we won't play with
> next_freq while the cpufreq core is updating the frequency.
>

I think your patch is ok for tackling this problem.

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 0c5c61a095f6..8991cc31b011 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> unsigned int next_freq)
> {
> - if (!sugov_update_next_freq(sg_policy, time, next_freq))
> - return;
> -
> - if (!sg_policy->work_in_progress) {
> - sg_policy->work_in_progress = true;
> + if (sugov_update_next_freq(sg_policy, time, next_freq))
> irq_work_queue(&sg_policy->irq_work);
> - }
> }
>
> /**
> @@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> unsigned int freq = arch_scale_freq_invariant() ?
> policy->cpuinfo.max_freq : policy->cur;
>
> + /*
> + * The previous frequency update didn't go as we expected it to be. Lets
> + * start again to make sure we don't miss any updates.
> + */
> + if (unlikely(policy->cur != sg_policy->next_freq)) {
> + sg_policy->next_freq = 0;
> + sg_policy->cached_raw_freq = 0;
> + }
> +
> freq = map_util_freq(util, freq, max);
>
> if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> @@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
> ignore_dl_rate_limit(sg_cpu, sg_policy);
>
> + if (!sg_policy->policy->fast_switch_enabled) {
> + raw_spin_lock(&sg_policy->update_lock);
> + if (sg_policy->work_in_progress)
> + goto unlock;
> + }
> +

Maybe it's better to bring the following code before the code above.
if (!sugov_should_update_freq(sg_policy, time))
return;

> if (!sugov_should_update_freq(sg_policy, time))
> - return;
> + goto unlock;
>
> /* Limits may have changed, don't skip frequency update */
> busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> @@ -363,13 +373,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> * concurrently on two different CPUs for the same target and it is not
> * necessary to acquire the lock in the fast switch case.
> */
> - if (sg_policy->policy->fast_switch_enabled) {
> + if (sg_policy->policy->fast_switch_enabled)
> sugov_fast_switch(sg_policy, time, next_f);
> - } else {
> - raw_spin_lock(&sg_policy->update_lock);
> + else
> sugov_deferred_update(sg_policy, time, next_f);
> +
> +unlock:
> + if (!sg_policy->policy->fast_switch_enabled)
> raw_spin_unlock(&sg_policy->update_lock);
> - }
> }
>
> static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> @@ -405,6 +416,9 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
>
> raw_spin_lock(&sg_policy->update_lock);
>
> + if (sg_policy->work_in_progress)
> + goto unlock;
> +
> sugov_iowait_boost(sg_cpu, time, flags);
> sg_cpu->last_update = time;
>
> @@ -419,33 +433,30 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> sugov_deferred_update(sg_policy, time, next_f);
> }
>
> +unlock:
> raw_spin_unlock(&sg_policy->update_lock);
> }
>
> static void sugov_work(struct kthread_work *work)
> {
> struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> - unsigned int freq;
> unsigned long flags;
>
> /*
> - * Hold sg_policy->update_lock shortly to handle the case where:
> - * incase sg_policy->next_freq is read here, and then updated by
> - * sugov_deferred_update() just before work_in_progress is set to false
> - * here, we may miss queueing the new update.
> - *
> - * Note: If a work was queued after the update_lock is released,
> - * sugov_work() will just be called again by kthread_work code; and the
> - * request will be proceed before the sugov thread sleeps.
> + * Prevent the schedutil hook to run in parallel while we are updating
> + * the frequency here and accessing next_freq.
> */
> raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> - freq = sg_policy->next_freq;
> - sg_policy->work_in_progress = false;
> + sg_policy->work_in_progress = true;
> raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>
> mutex_lock(&sg_policy->work_lock);
> - __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> + __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L);
> mutex_unlock(&sg_policy->work_lock);
> +
> + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> + sg_policy->work_in_progress = false;
> + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> }
>
> static void sugov_irq_work(struct irq_work *irq_work)
>
> --
> viresh