RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected

From: Doug Smythies
Date: Tue Feb 13 2024 - 13:07:52 EST

On 2024.02.13 03:27 Vincent wrote:
> On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>> On 2024.02.11 05:36 Vincent wrote:
>>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>>>> On 2024.02.09.14:11 Vincent wrote:
>>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>>>>>> sched/cpufreq: Rework schedutil governor performance estimation
>>>>>> There was previous bisection and suggestion of reversion,
>>>>>> but I guess it wasn't done in the end. [1]
>>>>> This has been fixed with
>>>> Okay, thanks. I didn't find that one.
>>>>>> The regression: reduced maximum CPU frequency is ignored.
>> Perhaps I should have said "sometimes ignored".
>> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
>> a 100% load on CPU 5, its frequency was sampled 1000 times:
>> 28.6% of samples were 2.4 GHz.
>> 71.4% of samples were 4.8 GHz (the max turbo frequency)
>> The results are highly non-repeatable, for example another sample:
>> 32.8% of samples were 2.4 GHz.
>> 76.2% of samples were 4.8 GHz
>> Another interesting side note: If load is added to the other CPUs,
>> the set maximum CPU frequency is enforced.
> Could you trace cpufreq and pstate ? I'd like to understand how
> policy->cur can be changed
> whereas there is this comment in intel_pstate_set_policy():
> /*
> * policy->cur is never updated with the intel_pstate driver, but it
> * is used as a stale frequency value. So, keep it within limits.
> */
> but cpufreq_driver_fast_switch() updates it with the freq returned by
> intel_cpufreq_fast_switch()

Perhaps I should submit a patch clarifying that comment.
It is true for the "intel_pstate" CPU frequency scaling driver but not for the
"intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
driver in passive mode. Sorry for any confusion.

I ran the during the test and do observe many, but
not all, CPUs requesting pstate 48 when the max is set to 24.
The calling request seems to always be via "fast_switch" path.
The root issue here appears to be a limit clamping problem for that path.
I'll try to attach a couple of graphs and screen shots from the tracer data.

I do not know how to trace cpufreq at the same time.

.. Doug

Attachment: all_cpu_pstates.png
Description: PNG image

Attachment: cpu5-example.png
Description: PNG image