Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling

From: Lukasz Luba
Date: Wed Feb 16 2022 - 18:28:26 EST




On 2/16/22 5:21 PM, Doug Anderson wrote:
Hi,

On Tue, Feb 8, 2022 at 1:32 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

Another important thing is the consistent scale of the power values
provided by the cooling devices. All of the cooling devices in a single
thermal zone should have power values reported either in milli-Watts
or scaled to the same 'abstract scale'.

This can change. We have removed the userspace governor from kernel
recently. The trend is to implement thermal policy in FW. Dealing with
some intermediate configurations are causing complicated design, support
of the algorithm logic is also more complex.

One thing that didn't get addressed is the whole "The trend is to
implement thermal policy in FW". I'm not sure I can get on board with
that trend. IMO "moving to FW" isn't a super great trend. FW is harder
to update than kernel and trying to keep it in sync with the kernel
isn't wonderful. Unless something _has_ to be in FW I personally
prefer it to be in the kernel.

There are pros and cons for both approaches (as always).

Although, there are some use cases, where the kernel is not able to
react that fast, e.g. sudden power usage changes, which can cause
that the power rail is not able to sustain within required conditions.
When we are talking about tough requirements for those power & thermal
policies, the mechanism must be fast, precised and reliable.

Here you can find Arm reference FW implementation and an IPA clone
in there (I have been reviewing this) [1][2].

As you can see there is a new FW feature set:
"MPMM, Traffic-cop and Thermal management".

Apart from Arm implementation, there are already known thermal
monitoring mechanisms in HW/FW. Like in the new Qcom SoCs which
are using this driver code [3]. The driver receives an interrupt
about throttling conditions and just populates the thermal pressure.


...although now that I re-read this, I'm not sure which firmware you
might be talking about. Is this the AP firmware, or some companion
chip / coprocessor? Even so, I'd still rather see things done in the
kernel when possible...

It's a FW run on a dedicated microprocessor. In Arm SoCs it's usually
some Cortex-M. We communicated with it from the kernel via SCMI drivers
(using shared memory and mailboxes). We recommend to use the SCMI
protocol to send e.g. 'performance request' to the FW via 'fast
channel' instead of having an implementation of PMIC and clock, and do
the voltage & freq change in the kernel (using drivers & locking). That
implementation allows to avoid costly locking and allows to go via
that SCMI cpufreq driver [4] and SCMI perf layer [5] the task scheduler.
We don't need a dedicated 'sugov' kthread in a Deadline policy to
do that work and preempt the currently running task.

IMHO the FW approach opens new opportunities.

Regards,
Lukasz

[1] https://github.com/ARM-software/SCP-firmware/pull/588
[2] https://github.com/ARM-software/SCP-firmware/pull/588/commits/59c62ead5eb66353ae805c367bfa86192e28c410
[3] https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/cpufreq/qcom-cpufreq-hw.c#L287
[4] https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L65
[5] https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/firmware/arm_scmi/perf.c#L465