Re: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected

From: Vincent Guittot
Date: Wed Feb 14 2024 - 10:38:14 EST


On Tue, 13 Feb 2024 at 19:07, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>
> On 2024.02.13 03:27 Vincent wrote:
> > On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >> On 2024.02.11 05:36 Vincent wrote:
> >>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >>>> On 2024.02.09.14:11 Vincent wrote:
> >>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
> >>>>>>
> >>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
> >>>>>> sched/cpufreq: Rework schedutil governor performance estimation
> >>>>>>
> >>>>>> There was previous bisection and suggestion of reversion,
> >>>>>> but I guess it wasn't done in the end. [1]
> >>>>>
> >>>>> This has been fixed with
> >>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
> >>>>
> >>>> Okay, thanks. I didn't find that one.
> >>>>
> >>>>>> The regression: reduced maximum CPU frequency is ignored.
> >>
> >> Perhaps I should have said "sometimes ignored".
> >> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
> >> a 100% load on CPU 5, its frequency was sampled 1000 times:
> >> 28.6% of samples were 2.4 GHz.
> >> 71.4% of samples were 4.8 GHz (the max turbo frequency)
> >> The results are highly non-repeatable, for example another sample:
> >> 32.8% of samples were 2.4 GHz.
> >> 76.2% of samples were 4.8 GHz
> >>
> >> Another interesting side note: If load is added to the other CPUs,
> >> the set maximum CPU frequency is enforced.
> >
> > Could you trace cpufreq and pstate ? I'd like to understand how
> > policy->cur can be changed
> > whereas there is this comment in intel_pstate_set_policy():
> > /*
> > * policy->cur is never updated with the intel_pstate driver, but it
> > * is used as a stale frequency value. So, keep it within limits.
> > */
> >
> > but cpufreq_driver_fast_switch() updates it with the freq returned by
> > intel_cpufreq_fast_switch()
>
> Perhaps I should submit a patch clarifying that comment.
> It is true for the "intel_pstate" CPU frequency scaling driver but not for the
> "intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
> driver in passive mode. Sorry for any confusion.
>
> I ran the intel_pstate_tracer.py during the test and do observe many, but
> not all, CPUs requesting pstate 48 when the max is set to 24.
> The calling request seems to always be via "fast_switch" path.
> The root issue here appears to be a limit clamping problem for that path.

Yes, I came to a similar conclusion as well. Whatever does schedutil
ask for, it should be clamped by cpu->max|min_perf_ratio.

Do you know if you use fast_switch or adjust_perf call back ?

> I'll try to attach a couple of graphs and screen shots from the tracer data.
>
> I do not know how to trace cpufreq at the same time.

I was thinking of enabling cpufreq traces in ftrace in addition to
pstate ones that intel_pstate_tracer.py is enabling

Vincent
>
> ... Doug
>