Re: [PATCH 2/2] thermal: cpufreq_cooling: Reuse effective_cpu_util()

From: Lukasz Luba
Date: Mon Oct 19 2020 - 07:11:27 EST




On 10/19/20 8:40 AM, Viresh Kumar wrote:
On 30-07-20, 12:16, Lukasz Luba wrote:
Hi Viresh,

On 7/30/20 7:24 AM, Viresh Kumar wrote:
On 17-07-20, 11:46, Vincent Guittot wrote:
On Thu, 16 Jul 2020 at 16:24, Lukasz Luba <lukasz.luba@xxxxxxx> wrote:
On 7/16/20 12:56 PM, Peter Zijlstra wrote:
Currently cpufreq_cooling appears to estimate the CPU energy usage by
calculating the percentage of idle time using the per-cpu cpustat stuff,
which is pretty horrific.

Even worse, it then *samples* the *current* CPU frequency at that
particular point in time and assumes that when the CPU wasn't idle
during that period - it had *this* frequency...

So there is 2 problems in the power calculation of cpufreq cooling device :
- How to get an accurate utilization level of the cpu which is what
this patch is trying to fix because using idle time is just wrong
whereas scheduler utilization is frequency invariant

Since this patch is targeted only towards fixing this particular
problem, should I change something in the patch to make it acceptable
?

- How to get power estimate from this utilization level. And as you
pointed out, using the current freq which is not accurate.

This should be tackled separately I believe.


I don't think that these two are separate. Furthermore, I think we
would need this kind of information also in future in the powercap.
I've discussed with Daniel this possible scenario.

We have a vendor who presented issue with the IPA input power and
pointed out these issues. Unfortunately, I don't have this vendor
phone but I assume it can last a few minutes without changing the
max allowed OPP. Based on their plots the frequency driven by the
governor is changing, also the idles are present during the IPA period.

Please give me a few days, because I am also plumbing these stuff
and would like to present it. These two interfaces: involving cpufreq
driver or fallback mode for utilization and EM.

Its been almost 3 months, do we have any update for this? We really
would like to get this patchset merged in some form as it provides a
simple update and I think more work can be done by anyone over it in
future.


I made a few implementations to compare the results with reality (power
measured using power meter on cluster rails). This idea with utilization
from the schedutil_cpu_util() has some edge cases with errors. The
signal is good for comparison and short prediction, but taking it as an
approximation for past arbitrary period (e.g. 100ms) has issues. It is
good when estimating energy cost during e.g. compute_energy().

What your renamed function of old schedutil_cpu_util() does is returning
the sum of utilization of runqueues (CFS, RT, DL, (IRQ)) at that
time. This utilization is dependent on sum of utilization of tasks being
there. These tasks could shuffle in the past (especially when we deal
with period ~100ms in IPA)...

I am currently working on a few different topics, not full time on this
one. Thus, I tend to agree that this provides 'simple update and ...
more work can be done' in future. Although, I am a bit concerned that it
would require some exports from the scheduler, some changed to
schedutil, which I am not sure they would pay off.

If Rafael and Peter will allow you to change these sub-systems, then I
don't mind.

What I am trying to implement is different than this idea.

Regards,
Lukasz