Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

From: Alexey Charkov
Date: Fri Mar 01 2024 - 03:26:22 EST


On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
>
> On 2024-03-01 06:12, Alexey Charkov wrote:
> > On Fri, Mar 1, 2024 at 12:21 AM Dragan Simic <dsimic@xxxxxxxxxxx>
> > wrote:
> >> On 2024-02-29 20:26, Alexey Charkov wrote:
> >> > Include thermal zones information in device tree for RK3588 variants.
> >> >
> >> > This also enables the TSADC controller unconditionally on all boards
> >> > to ensure that thermal protections are in place via throttling and
> >> > emergency reset, once OPPs are added to enable CPU DVFS.
> >> >
> >> > The default settings (using CRU as the emergency reset mechanism)
> >> > should work on all boards regardless of their wiring, as CRU resets
> >> > do not depend on any external components. Boards that have the TSHUT
> >> > signal wired to the reset line of the PMIC may opt to switch to GPIO
> >> > tshut mode instead (rockchip,hw-tshut-mode = <1>;)
> >>
> >> Quite frankly, I'm still not sure that enabling this on the SoC level
> >> is the way to go. As I already described in detail, [4] according to
> >> the RK3588 Hardware Design Guide v1.0 and the Rock 5B schematic, we
> >> should actually use GPIO-based handling for the thermal runaways on
> >> the Rock 5B. Other boards should also be investigated individually,
> >> and the TSADC should be enabled on a board-to-board basis.
> >
> > With all due respect, I disagree, here is why:
> > - Neither the schematic nor the hardware design guide, on which the
> > schematic seems to be based, prescribes a particular way to handle
> > thermal runaways. They only provide the possibility of GPIO based
> > resets, along with the CRU based one
>
> Please note that other documents from Rockchip also exist. Below is
> a link to a screenshot from the Thermal developer guide, version 1.0,
> which describes the whole thing further. I believe it's obvious that
> the thermal runaway is to be treated as a board-level feature.
>
> - https://i.imgur.com/IJ6dSAc.png

Frankly, that still doesn't make TSADC per se a board-level thing IMO.
The only thing that is board-level is the wiring of GPIO based resets,
which I fully agree should go to board .dts for boards that support
it, but that's not part of the current defaults and can be safely
added later.

TSADC is inside the SoC. CRU is inside the SoC. They work just fine
for a thermal reset, even if no dedicated reset logic is wired on the
board. I really don't see any downsides in having TSADC enabled by
default with CRU based resets:
- it's a safe default (i.e. I cannot think of any configuration or use
case where enabled-by-default TSADC does any harm)
- it's safer than accidentally forgetting to enable TSADC (as it adds
thermal protection which is otherwise missing)
- it will work on all boards (even if it doesn't utilize the full
hardware functionality by ignoring GPIO resets that some boards also
have in addition to the CRU)
- and it requires fewer overrides in board .dts files

Sounds like a no-regret move to me.

> To be fair, that version of the Thermal developer guide dates back to
> 2019, meaning that it technically applies to the RK3399, for example,
> but the TSADC and reset circuitry design has basically remained the
> same for the RK3588.
>
> > - My strong belief is that defaults (regardless of context) should be
> > safe and reasonable, and should also minimize the need to override
> > them
>
> Please note that the TSADC is disabled in the RK3399 SoC dtsi, so having
> it disabled in the RK3588(s) SoC dtsi would provide some consistency.

I'm happy to produce a patch to reverse the logic in RK3399 (and any
others for that matter) to also have TSADC enabled by default there,
thus saving several lines of code, if it's just about consistency.

> Though, the RK3399 still does it in a safe way, by moving the OPPs into
> a separate dtsi file, named rk3399-opp.dtsi, which the board dts files
> then include together with enabling the TSADC.
>
> If you agree, let's employ the same approach for the RK3588(s), by
> having
> the its OPPs defined in a separate file, named rk3588s-opp.dtsi, etc.

Separate file for OPPs is a good no-regret move to declutter the SoC
level .dtsi (as the OPP table is long and boring) - happy to move it
regardless of the outcome of the above TSADC discussion. Thanks for
the pointer!

> > - In context of dts/dtsi, as far as I understand the general logic
> > behind the split, the SoC .dtsi should contain all the things that are
> > fully contained within the SoC and do not depend on the wiring of a
> > particular board or its target use case. Boards then
> > add/remove/override settings to match their wiring and use case more
> > closely
>
> Of course, but the thermal shutdown is obviously a board-level feature,
> which I described further above.

Not so obvious to me :-) I don't mean to be stubborn or uncooperative
here, but I really can't find any technical merit in having it enabled
at board level instead of SoC level.

Switching to PMIC-assisted resets is one thing - it definitely should
go to board files, as it depends on the specific wiring of the
TSADC_SHUT signal. Enabling TSADC in a default configuration that can
and will work on all boards regardless of their wiring is another
thing. I'm just arguing for the latter.

To me it seems similar to the watchdog timer situation: we enable it
at the SoC level [1], as it is expected to work in its default
configuration regardless of the board wiring, and it provides
protection against system malfunctions. Doesn't matter if the board or
its userspace code ends up using the full functionality - it just sits
there waiting for its spotlight without hurting anybody.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3588s.dtsi#n1872

Best regards,
Alexey