Re: [PATCH v2] arm64: dts: rockchip: enable built-in thermal monitoring on rk3588

From: Alexey Charkov
Date: Sun Jan 21 2024 - 14:58:26 EST


On Fri, Jan 19, 2024 at 8:21 PM Daniel Lezcano
<daniel.lezcano@xxxxxxxxxx> wrote:
Hello Daniel,

Thanks a lot for your review and comments! Please see some reflections below.

> On 09/01/2024 20:19, Alexey Charkov wrote:
> > Include thermal zones information in device tree for rk3588 variants
> > and enable the built-in thermal sensing ADC on RADXA Rock 5B
> >
> > Signed-off-by: Alexey Charkov <alchark@xxxxxxxxx>
> > ---
> > Changes in v2:
> > - Dropped redundant comments
> > - Included all CPU cores in cooling maps
> > - Split cooling maps into more granular ones utilizing TSADC
> > channels 1-3 which measure temperature by separate CPU clusters
> > instead of channel 0 which measures the center of the SoC die
> > ---
> > .../boot/dts/rockchip/rk3588-rock-5b.dts | 4 +
> > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 151 ++++++++++++++++++
> > 2 files changed, 155 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > index a5a104131403..f9d540000de3 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
> > @@ -772,3 +772,7 @@ &usb_host1_ehci {
> > &usb_host1_ohci {
> > status = "okay";
> > };
> > +
> > +&tsadc {
> > + status = "okay";
> > +};
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > index 8aa0499f9b03..8d54998d0ecc 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
> > @@ -10,6 +10,7 @@
> > #include <dt-bindings/reset/rockchip,rk3588-cru.h>
> > #include <dt-bindings/phy/phy.h>
> > #include <dt-bindings/ata/ahci.h>
> > +#include <dt-bindings/thermal/thermal.h>
> >
> > / {
> > compatible = "rockchip,rk3588";
> > @@ -2112,6 +2113,156 @@ tsadc: tsadc@fec00000 {
> > status = "disabled";
> > };
> >
> > + thermal_zones: thermal-zones {
> > + /* sensor near the center of the whole chip */
> > + soc_thermal: soc-thermal {
> > + polling-delay-passive = <20>;
>
> There is no mitigation set for this thermal zone. It is pointless to
> specify a passive polling.

Indeed, it makes sense to me. There seems to be a catch though in that
the driver calls the generic thermal_of_zone_register during the
initial probe, which expects both of those polling delays to be
present in the device tree, otherwise it simply refuses to add the
respective thermal zone, see drivers/thermal/thermal_of.c:502

> > + polling-delay = <1000>;
>
> The driver is interrupt driven. No need to poll.

Same here as above

> > + sustainable-power = <2100>;
>
> There is no mitigation with this thermal zone. Specifying a sustainable
> power does not make sense.

Thanks, will drop this in v3!

> > + thermal-sensors = <&tsadc 0>;
> > +
> > + trips {
> > + soc_crit: soc-crit {
> > + temperature = <115000>;
> > + hysteresis = <2000>;
>
> This trip point leads to a system shutdown / reboot. It is not necessary
> to specify a hysteresis.

Similar to the above, the generic thermal_of code refuses to add the
trip point if it has no hysteresis property defined (regardless of the
trip type), see drivers/thermal/thermal_of.c:109

> > + type = "critical";
> > + };
> > + };
> > + };
> > +
> > + /* sensor between A76 cores 0 and 1 */
> > + bigcore0_thermal: bigcore0-thermal {
> > + polling-delay-passive = <20>;
> > + polling-delay = <1000>;
>
> The driver is interrupt driven. No need to poll.
>
> > + thermal-sensors = <&tsadc 1>;
> > +
> > + trips {
> > + bigcore0_alert: bigcore0-alert {
> > + temperature = <85000>;
> > + hysteresis = <2000>;
> > + type = "passive";
> > + };
> > + bigcore0_crit: bigcore0-crit {
> > + temperature = <115000>;
> > + hysteresis = <2000>;
> > + type = "critical";
> > + };
> > + };
> > + cooling-maps {
> > + map0 {
> > + trip = <&bigcore0_alert>;
> > + cooling-device =
> > + <&cpu_b0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > + <&cpu_b1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> > + contribution = <1024>;
>
> If you specify the contribution, that means it is expected to use the
> IPA governor. However, this one needs an extra trip point before 'alert'
> to begin collecting temperatures in order to initialize the PID loop of
> the IPA.

Thank you! Will add extra passive cooling trip points at 75C to all
three CPU clusters.

Best regards,
Alexey