Re: [PATCH V2 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util()

From: Viresh Kumar
Date: Thu Nov 12 2020 - 04:41:36 EST


On 23-10-20, 12:37, Peter Zijlstra wrote:
> Actual numbers that show the goodness would be nice ;-) Because clearly
> we're doing this make it better.

Hi Peter,

I tried the patchset with hackbench, sysbench and schbench. None of them showed
any regression or significant improvements. schbench was the one I was most
hopeful with as it creates the scenario where the utilization numbers provide a
better estimate of the future.

Scenario 1: The CPUs were mostly idle in the previous polling window of the IPA
governor as the tasks were sleeping and here are the details from traces (load
is in %):

thermal_power_cpu_get_power: Old: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
thermal_power_cpu_get_power: New: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960

Here, the "Old" line gives the load and requested_power (dynamic_power here)
numbers calculated using the idle time based implementation. And "New" is based
on CPU utilization from this patchset.

As can be clearly seen, the load and requested_power numbers are simply
incorrect in the idle time based approach and the numbers collected from CPU's
utilization are much better and will also match the expectations of the
schedutil governor.

Scenario 2: The CPUs were busy in the previous polling window of the IPA
governor:

thermal_power_cpu_get_power: Old: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
thermal_power_cpu_get_power: New: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672

As can be seen, the idle time based load is 100% for all the CPUs as it took
only the last window into account, but in reality the CPUs aren't that loaded as
shown by the utilazation numbers.

Though this patchset improves the power estimation done by the cpufreq_cooling
driver (which matches with the freq scaling governor, schedutil, as well), the
IPA governor doesn't necessarily appreciate the correctness of it as it takes
the decision to choose the next cooling state based on multiple factors, like
current temperature, target temperature, requested_power, all power players (who
request power from it), etc. The algorithm is complex there and I am afraid the
improved numbers here don't necessarily translate to better numbers for the
benchmarks like schbench. Another factor can be IPAs tuning for my platform
(Hikey6220).

Irrespective of the IPA governor, the estimate provided by the cpufreq cooling
driver does improve a lot with this patchset and are better aligned with the
schedutil governor and I believe it would be better to merge this nevertheless.

I have already prepared the next version which takes care of !SMP case, was just
holding it off until I was trying to get some numbers out.

--
viresh