Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

From: Matthias Kaehlcke
Date: Mon Feb 07 2022 - 20:13:16 EST


On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote:
> The Energy Model supports power values either in Watts or in some abstract
> scale. When the 2nd option is in use, the thermal governor IPA should not
> be allowed to operate, since the relation between cooling devices is not
> properly defined. Thus, it might be possible that big GPU has lower power
> values in abstract scale than a Little CPU. To mitigate a misbehaviour
> of the thermal control algorithm, simply not register a cooling device
> capable of working with IPA.

Ugh, this would break thermal throttling for existing devices that are
currently supported in the upstream kernel.

Wasn't the conclusion that it is the responsability of the device tree
owners to ensure that cooling devices with different scales aren't used
in the same thermal zone?

That's also what's currently specified in the power allocator
documentation:

Another important thing is the consistent scale of the power values
provided by the cooling devices. All of the cooling devices in a single
thermal zone should have power values reported either in milli-Watts
or scaled to the same 'abstract scale'.

Which was actually added by yourself:

commit 5a64f775691647c242aa40d34f3512e7b179a921
Author: Lukasz Luba <lukasz.luba@xxxxxxx>
Date: Tue Nov 3 09:05:58 2020 +0000

PM: EM: Clarify abstract scale usage for power values in Energy Model

The Energy Model (EM) can store power values in milli-Watts or in abstract
scale. This might cause issues in the subsystems which use the EM for
estimating the device power, such as:

- mixing of different scales in a subsystem which uses multiple
(cooling) devices (e.g. thermal Intelligent Power Allocation (IPA))

- assuming that energy [milli-Joules] can be derived from the EM power
values which might not be possible since the power scale doesn't have
to be in milli-Watts

To avoid misconfiguration add the requisite documentation to the EM and
related subsystems: EAS and IPA.

Signed-off-by: Lukasz Luba <lukasz.luba@xxxxxxx>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>


It's ugly to have the abstract scales in the first place, but that's
unfortunately what we currently have for at least some cooling devices.

IMO it would be preferable to stick to catching incompliant configurations
in reviews, rather than breaking thermal throttling for existing devices
with configurations that comply with the current documentation.