RE: sched/cpufreq: Rework schedutil governor performance estimation - Regression bisected

From: Doug Smythies
Date: Tue Feb 13 2024 - 13:07:52 EST


On 2024.02.13 03:27 Vincent wrote:
> On Sun, 11 Feb 2024 at 17:43, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>> On 2024.02.11 05:36 Vincent wrote:
>>> On Sat, 10 Feb 2024 at 00:16, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>>>> On 2024.02.09.14:11 Vincent wrote:
>>>>> On Fri, 9 Feb 2024 at 22:38, Doug Smythies <dsmythies@xxxxxxxxx> wrote:
>>>>>>
>>>>>> I noticed a regression in the 6.8rc series kernels. Bisecting the kernel pointed to:
>>>>>>
>>>>>> # first bad commit: [9c0b4bb7f6303c9c4e2e34984c46f5a86478f84d]
>>>>>> sched/cpufreq: Rework schedutil governor performance estimation
>>>>>>
>>>>>> There was previous bisection and suggestion of reversion,
>>>>>> but I guess it wasn't done in the end. [1]
>>>>>
>>>>> This has been fixed with
>>>>> https://lore.kernel.org/all/170539970061.398.16662091173685476681.tip-bot2@tip-bot2/
>>>>
>>>> Okay, thanks. I didn't find that one.
>>>>
>>>>>> The regression: reduced maximum CPU frequency is ignored.
>>
>> Perhaps I should have said "sometimes ignored".
>> With a maximum CPU frequency for all CPUs set to 2.4 GHz and
>> a 100% load on CPU 5, its frequency was sampled 1000 times:
>> 28.6% of samples were 2.4 GHz.
>> 71.4% of samples were 4.8 GHz (the max turbo frequency)
>> The results are highly non-repeatable, for example another sample:
>> 32.8% of samples were 2.4 GHz.
>> 76.2% of samples were 4.8 GHz
>>
>> Another interesting side note: If load is added to the other CPUs,
>> the set maximum CPU frequency is enforced.
>
> Could you trace cpufreq and pstate ? I'd like to understand how
> policy->cur can be changed
> whereas there is this comment in intel_pstate_set_policy():
> /*
> * policy->cur is never updated with the intel_pstate driver, but it
> * is used as a stale frequency value. So, keep it within limits.
> */
>
> but cpufreq_driver_fast_switch() updates it with the freq returned by
> intel_cpufreq_fast_switch()

Perhaps I should submit a patch clarifying that comment.
It is true for the "intel_pstate" CPU frequency scaling driver but not for the
"intel_cpufreq" CPU frequency scaling driver, also known as the intel_pstate
driver in passive mode. Sorry for any confusion.

I ran the intel_pstate_tracer.py during the test and do observe many, but
not all, CPUs requesting pstate 48 when the max is set to 24.
The calling request seems to always be via "fast_switch" path.
The root issue here appears to be a limit clamping problem for that path.
I'll try to attach a couple of graphs and screen shots from the tracer data.

I do not know how to trace cpufreq at the same time.

.. Doug

Attachment: all_cpu_pstates.png
Description: PNG image

Attachment: cpu5-example.png
Description: PNG image