Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

From: Dragan Simic
Date: Fri Mar 01 2024 - 01:14:16 EST


On 2024-03-01 06:20, Alexey Charkov wrote:
On Fri, Mar 1, 2024 at 1:11 AM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
Please see also some nitpicks below, which I forgot to mention in
my earlier response. I'm sorry for that.

On 2024-02-29 20:26, Alexey Charkov wrote:
> Include thermal zones information in device tree for RK3588 variants.
>
> This also enables the TSADC controller unconditionally on all boards
> to ensure that thermal protections are in place via throttling and
> emergency reset, once OPPs are added to enable CPU DVFS.
>
> The default settings (using CRU as the emergency reset mechanism)
> should work on all boards regardless of their wiring, as CRU resets
> do not depend on any external components. Boards that have the TSHUT
> signal wired to the reset line of the PMIC may opt to switch to GPIO
> tshut mode instead (rockchip,hw-tshut-mode = <1>;)
>
> It seems though that downstream kernels don't use that, even for
> those boards where the wiring allows for GPIO based tshut, such as
> Radxa Rock 5B [1], [2], [3]
>
> [1]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts#L540
> [2]
> https://github.com/radxa/kernel/blob/stable-5.10-rock5/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#L5433
> [3] https://dl.radxa.com/rock5/5b/docs/hw/radxa_rock_5b_v1423_sch.pdf
> page 11 (TSADC_SHUT_H)
>
> Signed-off-by: Alexey Charkov <alchark@xxxxxxxxx>
> ---
> arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 176
> +++++++++++++++++++++++++++++-
> 1 file changed, 175 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> index 36b1b7acfe6a..9bf197358642 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> @@ -10,6 +10,7 @@
> #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> #include <dt-bindings/phy/phy.h>
> #include <dt-bindings/ata/ahci.h>
> +#include <dt-bindings/thermal/thermal.h>
>
> / {
> compatible = "rockchip,rk3588";
> @@ -2225,7 +2226,180 @@ tsadc: tsadc@fec00000 {
> pinctrl-1 = <&tsadc_shut>;
> pinctrl-names = "gpio", "otpout";
> #thermal-sensor-cells = <1>;
> - status = "disabled";
> + status = "okay";
> + };
> +
> + thermal_zones: thermal-zones {
> + /* sensor near the center of the SoC */
> + package_thermal: package-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 0>;
> +
> + trips {
> + package_crit: package-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 0 and 1 */
> + bigcore0_thermal: bigcore0-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 1>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */

See, I'm not a native English speaker, but I've spent a lot of time
and effort improving my English skills. Thus, perhaps these comments
may or may not seem like unnecessary nitpicking, depending on how much
someone pays attention to writing style in general, but I'll risk to
be annoying and state these comments anyway. :)

The comment above could be written in a much more condensed form like
this, which would also be a bit more accurate:


/* IPA threshold, when IPA governor is used */

IOW, we're writing all this for someone to read later, but we should
(and can) perfectly reasonably expect some already existing background
knowledge from the readers. In other words, we should be as concise
as possible.

In fact, the power allocation governor code itself doesn't call those
trips threshold or target as your suggested wording would imply.
Instead, it calls them "switch on temperature" and "maximum desired
temperature" [1]. Maybe we can call them that in the comments (and
also avoid calling the governor IPA, because upstream code only calls
it a "power allocator").

Hmm, but "IPA" is still mentioned in exactly three places in the files
under drivers/thermal. I think that warrants the use of "IPA", which
is also widely used pretty much everywhere.

Perhaps a win-win would be to have only the very first of the comments
like this, to introduce "IPA" as an acronym:

/* Power allocator (IPA) thermal governor */
/* switch-on point, when IPA governor is used */

Next, "the target temperature" is mentioned more than a few times in
drivers/thermal/gov_power_allocator.c, which I believe makes the use
of "IPA target" perfectly valid. Actually, let's use "IPA target
temperature", if you agree, to make it self descriptive.

Finally, the threshold... Based on drivers/thermal/gov_power_allocator.c,
I think "IPA switch-on point" would be a good choice, which I already
used above in the proposed opening comment.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/gov_power_allocator.c#n483

> + bigcore0_alert0: bigcore0-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */

Similarly to the above, I'd suggest this:

/* IPA target, when IPA governor is used */

Having such brief comments should make it all perfectly understandable
to anyone who's already familiar with the way IPA governor works.
Everyone else should be welcome to read up a bit on IPA first.

> + bigcore0_alert1: bigcore0-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore0_crit: bigcore0-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore0_alert1>;
> + cooling-device =
> + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between A76 cores 2 and 3 */
> + bigcore2_thermal: bigcore2-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 2>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */

The same as above.

> + bigcore2_alert0: bigcore2-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */

The same as above.

> + bigcore2_alert1: bigcore2-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + bigcore2_crit: bigcore2-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&bigcore2_alert1>;
> + cooling-device =
> + <&cpu_b2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_b3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor between the four A55 cores */
> + little_core_thermal: littlecore-thermal {
> + polling-delay-passive = <100>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 3>;
> +
> + trips {
> + /* threshold to start collecting temperature
> + * statistics e.g. with the IPA governor
> + */

The same as above.

> + littlecore_alert0: littlecore-alert0 {
> + temperature = <75000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + /* actual control temperature */

The same as above.

> + littlecore_alert1: littlecore-alert1 {
> + temperature = <85000>;
> + hysteresis = <2000>;
> + type = "passive";
> + };
> + littlecore_crit: littlecore-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + cooling-maps {
> + map0 {
> + trip = <&littlecore_alert1>;
> + cooling-device =
> + <&cpu_l0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> + <&cpu_l3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> + };
> + };
> + };
> +
> + /* sensor near the PD_CENTER power domain */
> + center_thermal: center-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 4>;
> +
> + trips {
> + center_crit: center-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + gpu_thermal: gpu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 5>;
> +
> + trips {
> + gpu_crit: gpu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> +
> + npu_thermal: npu-thermal {
> + polling-delay-passive = <0>;
> + polling-delay = <0>;
> + thermal-sensors = <&tsadc 6>;
> +
> + trips {
> + npu_crit: npu-crit {
> + temperature = <115000>;
> + hysteresis = <0>;
> + type = "critical";
> + };
> + };
> + };
> };
>
> saradc: adc@fec10000 {