Re: [PATCH 1/2] thermal: power_allocator: maintain the device statistics from going stale

From: Lukasz Luba
Date: Tue Apr 06 2021 - 04:44:12 EST




On 4/2/21 4:54 PM, Daniel Lezcano wrote:
On 31/03/2021 18:33, Lukasz Luba wrote:
When the temperature is below the first activation trip point the cooling
devices are not checked, so they cannot maintain fresh statistics. It
leads into the situation, when temperature crosses first trip point, the
statistics are stale and show state for very long period.

Can you elaborate the statistics you are referring to ?

I can understand the pid controller needs temperature but I don't
understand the statistics with the cooling device.



The allocate_power() calls cooling_ops->get_requested_power(),
which is for CPUs cpufreq_get_requested_power() function.
In that cpufreq implementation for !SMP we still has the
issue of stale statistics. Viresh, when he introduced the usage
of sched_cpu_util(), he fixed that 'long non-meaningful period'
of the broken statistics and it can be found since v5.12-rc1.

The bug is still there for the !SMP. Look at the way how idle time
is calculated in get_load() [1]. It relies on 'idle_time->timestamp'
for calculating the period. But when this function is not called,
the value can be very far away in time, e.g. a few seconds back,
when the last allocate_power() was called.

The bug is there for both SMP and !SMP [2] for older kernels, which can
be used in Android or ChromeOS. I've been considering to put this simple
IPA fix also to some other kernels, because Viresh's change is more
a 'feature' and does not cover both platforms.

Regards,
Lukasz

[1] https://elixir.bootlin.com/linux/v5.12-rc5/source/drivers/thermal/cpufreq_cooling.c#L156
[2] https://elixir.bootlin.com/linux/v5.11.11/source/drivers/thermal/cpufreq_cooling.c#L143