Re: [PATCH 1/2] thermal: power allocator: change the 'k_i' coefficient estimation

From: Lukasz Luba
Date: Tue Oct 13 2020 - 08:04:55 EST




On 10/13/20 12:22 PM, Daniel Lezcano wrote:
On 13/10/2020 12:59, Lukasz Luba wrote:
Hi Daniel,

On 10/13/20 11:21 AM, Daniel Lezcano wrote:

Hi Lukasz,

On 02/10/2020 14:24, Lukasz Luba wrote:
Intelligent Power Allocation (IPA) is built around the PID controller
concept. The initialization code tries to setup the environment based on
the information available in DT or estimate the value based on minimum
power reported by each of the cooling device. The estimation will
have an
impact on the PID controller behaviour via the related 'k_po', 'k_pu',
'k_i' coefficients and also on the power budget calculation.

This change prevents the situation when 'k_i' is relatively big compared
to 'k_po' and 'k_pu' values. This might happen when the estimation for
'sustainable_power' returned small value, thus 'k_po' and 'k_pu' are
small.

Signed-off-by: Lukasz Luba <lukasz.luba@xxxxxxx>
---
  drivers/thermal/gov_power_allocator.c | 8 ++++++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/gov_power_allocator.c
b/drivers/thermal/gov_power_allocator.c
index 5cb518d8f156..f69fafe486a5 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -131,6 +131,7 @@ static void estimate_pid_constants(struct
thermal_zone_device *tz,
      int ret;
      int switch_on_temp;
      u32 temperature_threshold;
+    s32 k_i;
        ret = tz->ops->get_trip_temp(tz, trip_switch_on,
&switch_on_temp);
      if (ret)
@@ -156,8 +157,11 @@ static void estimate_pid_constants(struct
thermal_zone_device *tz,
          tz->tzp->k_pu = int_to_frac(2 * sustainable_power) /
              temperature_threshold;
  -    if (!tz->tzp->k_i || force)
-        tz->tzp->k_i = int_to_frac(10) / 1000;
+    if (!tz->tzp->k_i || force) {
+        k_i = tz->tzp->k_pu / 10;
+        tz->tzp->k_i = k_i > 0 ? k_i : 1;
+    }

I do not understand the rational behind this change.

This is the unfortunate impact of the EM abstract scale of power values.
IPA didn't have to deal with it, because we always had milli-Watts.
Because the EM allows the bogoWatts and some vendors already have
them I have to re-evaluate the IPA.


Do you have some values to share describing what would be the impact of
this change?

Yes, here is an example:
EM has 3 devices with abstract scale power values, where minimum power
is 25 and max is 200. The minimum power is used by
estimate_sustainable_power()
as a sum of all devices' min power. Sustainable power is going to be
estimated to 75.

Then in the code we have 'temperature_threshold' which is in
milli-Celcius, thus 15degC is 15000.

We estimate 'k_po' according to:
int_to_frac(sustainable_power) / temperature_threshold;

which is:
(75 << 10) / 15000 = ~75000 / 15000 = 5 <-- 'k_po'

then k_pu:
((2*75) << 10) / 15000 = ~150000 / 15000 = 10

Then the old 'k_i' is just hard-coded 10, which is
the same order of magnitude to what is in 'k_pu'.
It should be 1 order of magnitude smaller than 'k_pu'.

I did some experiments and the bigger 'k_i' slows down a lot
the rising temp. That's why this change.

It was OK to have k_i=10 when we were in milliWatts world,
when the min power value was bigger, thus 'k_pu' was also bigger
than our hard-coded 'k_i'.


Depending on the thermal behavior of a board, these coefficients could
be very different, no ?


Yes, I strongly believe that vendor engineers will make experiments with
these values and not go with default. Then they will store the k_pu,
k_po, k_i via sysfs interface, with also sustainable_power.

IMHO it is the opposite. For what I've seen, the IPA is not used or the
k_* are misunderstood, thus not changed. The PID regulation loop
technique is not quite used and known by everyone.

There is quite a few DT entries of 'sustainable-power' so I assumed
it is known, but you might be right.


But I have to also fix the hard-coded k_i in the estimation. As
described above, when we have small power values from abstract scale,
the k_i stays too big.

May be it is preferable to adjust the k_* dynamically given the
undershot and overshot results? And then add a set of less opaque
parameters for the user, like the time or watts, no?


Hmmmm, this is interesting, I haven't thought about it. Thank you
for this idea.
That would require a re-design of current IPA. IPA trying to figure
out better k_* values... I will discuss it internally.

It would take time, definitely more than the proposed small fix
addressing abstract scale and hard-coded 'k_i'.
Do you think that this fix can be applied and then I can experiment
on what you suggested?

There is v3 reviewed by Ionela [1].

Thank you for your comments.

Regards,
Lukasz

[1] https://lore.kernel.org/linux-pm/20201009135850.14727-1-lukasz.luba@xxxxxxx/