Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected

From: Vincent Guittot
Date: Tue Feb 13 2024 - 06:28:34 EST


On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>
> On 2024.02.11 05:36 Vincent wrote:
> > On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >> On 2024.02.09.14:11 Vincent wrote:
> >>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >>>>
> >>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> >>>>
> >>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> >>>> sched/cpufreq: Rework schedutil governor performance estimation
> >>>>
> >>>> There was previous bisection and suggestion of reversion,
> >>>> but I guess it wasn't done in the end. [1]
> >>>
> >>> This has been fixed with
> >>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
> >>
> >> Okay, thanks. I didn't find that one.
> >>
> >>>> The regression: reduced maximum CPU frequency is ignored.
>
> Perhaps I should have said "sometimes ignored".
> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
> a 100% load on CPU 5, its frequency was sampled 1000 times:
> 28.6% of samples were 2.4 GHz.
> 71.4% of samples were 4.8 GHz (the max turbo frequency)
> The results are highly non-repeatable, for example another sample:
> 32.8% of samples were 2.4 GHz.
> 76.2% of samples were 4.8 GHz
>
> Another interesting side note: If load is added to the other CPUs,
> the set maximum CPU frequency is enforced.

Could you trace cpufreq and pstate ? I'd like to understand how
policy->cur can be changed
whereas there is this comment in intel_pstate_set_policy():
/*
* policy->cur is never updated with the intel_pstate driver, but it
* is used as a stale frequency value. So, keep it within limits.
*/

but cpufreq_driver_fast_switch() updates it with the freq returned by
intel_cpufreq_fast_switch()

>
> >>
> >>> This seems to be something new.
> >>> schedutil doesn't impact the max_freq and it's up to cpufreq driver
> >>> select the final freq which should stay within the limits
> >>
> >> Okay. All I know is this is the commit that caused the regression.
> >
> > Could you check if the fix solved your problem ?
>
> Given the tags for that commit:
>
> $ git tag --contains e37617c8e53a
> v6.8-rc1
> v6.8-rc2
> v6.8-rc3
>
> It does not solve issue I have raised herein, as it exists in v6.8-rc1 but not v6.7
>
> >> I do not know why, but I do wonder if there could any relationship with
> >> the old, never fixed, problem of incorrect stale frequencies reported
> >> under the same operating conditions. See the V2 note:
> >> https://lore.kernel.org/all/001d01d9d3a7$71736f50$545a4df0$@telus.net/
> >
> > IIUC the problem is that policy->cur is not used by intel_cpufreq and
> > stays set to the last old/init value.
>
> Yes, exactly.
>
> > Do I get it right that this is only informative ?
>
> I don't know, that is what I was wondering. I do not know if the two issues
> are related or not.
>
> > Normally cpufreq governor checks the new limits and updates current
> > freq if necessary except when fast switch is enabled.
>
> >> where I haven't been able to figure out a solution.
>
> >>>> Conditions:
> >>>> CPU frequency scaling driver: intel_cpufreq (a.k.a intel_pstate in passive mode)
> >>>> CPU frequency scaling governor: schedutil
> >>>> HWP (HardWare Pstate) control (a.k.a. Intel_speedshift): Enabled
> >>>> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> >>>>
> >>>> I did not check any other conditions, i.e. HWP disabled or the acpi-cpufreq driver.
>
> Changing from HWP enabled to HWP disabled, it works properly.
>
> ...
>
> >>>> [1] https://lore.kernel.org/all/CAKfTPtDCQuJjpi6=zjeWPcLeP+ZY5Dw7XDrZ-LpXqEAAUbXLhA@xxxxxxxxxxxxxx/
>
>