Re: [PATCH v2] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588

From: Daniel Lezcano
Date: Tue Jan 23 2024 - 19:15:05 EST


On 23/01/2024 20:47, Alexey Charkov wrote:
On Mon, Jan 22, 2024 at 4:04 AM Daniel Lezcano
<daniel.lezcano@xxxxxxxxxx> wrote:


Hi Alexey,


On 21/01/2024 20:57, Alexey Charkov wrote:
On Fri, Jan 19, 2024 at 8:21 PM Daniel Lezcano
<daniel.lezcano@xxxxxxxxxx> wrote:
Hello Daniel,

Thanks a lot for your review and comments! Please see some reflections below.

On 09/01/2024 20:19, Alexey Charkov wrote:
Include thermal zones information in device tree for rk3588 variants
and enable the built-in thermal sensing ADC on RADXA Rock 5B

Signed-off-by: Alexey Charkov <alchark@xxxxxxxxx>
---
Changes in v2:
- Dropped redundant comments
- Included all CPU cores in cooling maps
- Split cooling maps into more granular ones utilizing TSADC
channels 1-3 which measure temperature by separate CPU clusters
instead of channel 0 which measures the center of the SoC die
---
.../boot/dts/rockchip/rk3588-rock-5b.dts | 4 +
arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 151 ++++++++++++++++++
2 files changed, 155 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index a5a104131403..f9d540000de3 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -772,3 +772,7 @@ &usb_host1_ehci {
&usb_host1_ohci {
status = "okay";
};
+
+&tsadc {
+ status = "okay";
+};
diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
index 8aa0499f9b03..8d54998d0ecc 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
@@ -10,6 +10,7 @@
#include <dt-bindings/reset/rockchip,rk3588-cru.h>
#include <dt-bindings/phy/phy.h>
#include <dt-bindings/ata/ahci.h>
+#include <dt-bindings/thermal/thermal.h>

/ {
compatible = "rockchip,rk3588";
@@ -2112,6 +2113,156 @@ tsadc: tsadc@fec00000 {
status = "disabled";
};

+ thermal_zones: thermal-zones {
+ /* sensor near the center of the whole chip */
+ soc_thermal: soc-thermal {
+ polling-delay-passive = <20>;

There is no mitigation set for this thermal zone. It is pointless to
specify a passive polling.

Indeed, it makes sense to me. There seems to be a catch though in that
the driver calls the generic thermal_of_zone_register during the
initial probe, which expects both of those polling delays to be
present in the device tree, otherwise it simply refuses to add the
respective thermal zone, see drivers/thermal/thermal_of.c:502

Usually:

polling-delay-passive = <0>;
polling-delay = <0>;

cf:

git grep "polling-delay = <0>" arch/arm64/boot/dts

For some reason when I have both polling-delay-passive and
polling-delay set to 0, the active cooling map I have in my board DT
(using a PWM controlled fan) behaves weirdly.



I use the following fragment in my board DTS:

+&package_thermal {
+ trips {
+ package_fan: package-fan {
+ temperature = <55000>;
+ hysteresis = <2000>;
+ type = "active";
+ };
+ };
+
+ cooling-maps {
+ map-fan {
+ trip = <&package_fan>;
+ cooling-device = <&fan THERMAL_NO_LIMIT
THERMAL_NO_LIMIT>;
+ };
+ };
+};

If I add polling-delay = <1000>; at the top, the fan speeds up and
down dynamically as the package temperature swings around 55C. If I
remove that (having set polling-delay = <0>; in rk3588s.dtsi), the fan
speeds up to the midpoint cooling state once the package temperature
approaches 55C, and then it just stays there forever: it doesn't speed
up above the midpoint even as the temperature climbs above 70C, nor
does it spin down as it falls back to around 45C.

Is that the expected behavior for when the polling is disabled?

I don't know the rest of the DT this fragment was added to, but I'm not surprised there is misbehavior because the configuration is not correct in this case.

If there is a thermal zone with an active trip and an associated cooling device like a fan, then:
-> polling-delay = <a_value>;
-> polling-delay-passive = <0>;

If there is a thermal zone with a passive cooling device like cpufreq cooling device, then 2 cases:

1. The sensor supports interrupt when crossing the trip point
-> polling-delay = <0>;
-> polling-delay-passive = <a_value>;

2. The sensor does not support interrupt when crossing the trip point
-> polling-delay = <a_value>;
-> polling-delay-passive = <another_value>;

Why?

When the cooling device is a passive cooling device, then the mitigation happens with a higher temperature sampling rate in order to change the state of the cooling device hundred of times per second. On a fan, the cooling effect is too slow for that so we keep the polling for that.


I haven't yet studied in detail if passive cooling kicks in correctly
with polling disabled, but this behavior with active cooling left me
quite confused - any pointers would be much appreciated.

Thanks a lot,
Alexey

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog