Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

From: Lukasz Luba
Date: Tue Feb 08 2022 - 04:32:46 EST




On 2/8/22 12:50 AM, Matthias Kaehlcke wrote:
On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote:
The Energy Model supports power values either in Watts or in some abstract
scale. When the 2nd option is in use, the thermal governor IPA should not
be allowed to operate, since the relation between cooling devices is not
properly defined. Thus, it might be possible that big GPU has lower power
values in abstract scale than a Little CPU. To mitigate a misbehaviour
of the thermal control algorithm, simply not register a cooling device
capable of working with IPA.

Ugh, this would break thermal throttling for existing devices that are
currently supported in the upstream kernel.

Could you point me to those devices? I cannot find them in the mainline
DT. There are no GPU devices which register Energy Model (EM) in
upstream, neither using DT (which would be power in mW) nor explicitly
providing EM get_power() callback. The EM is needed to have IPA.

Please clarify which existing devices are going to be broken with this
change.


Wasn't the conclusion that it is the responsability of the device tree
owners to ensure that cooling devices with different scales aren't used
in the same thermal zone?

It's based on assumption that someone has DT and control. There was also
implicit assumption that IPA would work properly on such platform - but
it won't.

1. You cannot have 2 thermal zones: one with CPUs and other with GPU
only and both working with two instances of IPA.

2. The abstract power scale doesn't guaranty anything about power values
and IPA was simply designed with milli-Watts in mind. So even working
on CPUs only using bogoWatts, is not what we could guaranty in IPA.


That's also what's currently specified in the power allocator
documentation:

Another important thing is the consistent scale of the power values
provided by the cooling devices. All of the cooling devices in a single
thermal zone should have power values reported either in milli-Watts
or scaled to the same 'abstract scale'.

This can change. We have removed the userspace governor from kernel
recently. The trend is to implement thermal policy in FW. Dealing with
some intermediate configurations are causing complicated design, support
of the algorithm logic is also more complex.


Which was actually added by yourself:

commit 5a64f775691647c242aa40d34f3512e7b179a921
Author: Lukasz Luba <lukasz.luba@xxxxxxx>
Date: Tue Nov 3 09:05:58 2020 +0000

PM: EM: Clarify abstract scale usage for power values in Energy Model

The Energy Model (EM) can store power values in milli-Watts or in abstract
scale. This might cause issues in the subsystems which use the EM for
estimating the device power, such as:

- mixing of different scales in a subsystem which uses multiple
(cooling) devices (e.g. thermal Intelligent Power Allocation (IPA))

- assuming that energy [milli-Joules] can be derived from the EM power
values which might not be possible since the power scale doesn't have
to be in milli-Watts

To avoid misconfiguration add the requisite documentation to the EM and
related subsystems: EAS and IPA.

Signed-off-by: Lukasz Luba <lukasz.luba@xxxxxxx>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>


It's ugly to have the abstract scales in the first place, but that's
unfortunately what we currently have for at least some cooling devices.

A few questions:
Do you use 'we' as Chrome engineers?
Could you point me to those devices please?
Are they new or some old platforms which need just maintenance?
How IPA works for you in such real platform configuration?
If it would be possible could you share some plots of temperature,
frequency and CPUs, GPU utilization?
Do you maybe know how the real power was scaled for them?

It would help me understand and judge.


IMO it would be preferable to stick to catching incompliant configurations
in reviews, rather than breaking thermal throttling for existing devices
with configurations that comply with the current documentation.


Without access to the source code of those devices, it's hard for me to
see if they are broken.

Regards,
Lukasz