Re: [PATCH v3 1/5] arm64: dts: rockchip: enable built-in thermal monitoring on RK3588

From: Dragan Simic
Date: Fri Mar 01 2024 - 08:11:21 EST


Hello Chen-Yu,

On 2024-03-01 13:02, Chen-Yu Tsai wrote:
On Fri, Mar 1, 2024 at 7:10 PM Alexey Charkov <alchark@xxxxxxxxx> wrote:
On Fri, Mar 1, 2024 at 12:52 PM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
> On 2024-03-01 09:25, Alexey Charkov wrote:
> > On Fri, Mar 1, 2024 at 9:51 AM Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
> Thus, who knows what might (or might not) go wrong if we don't reset the
> PMIC at the same time when the CRU resets the SoC? Unfortunately, the
> things aren't that straightforward.
>
> On top of that, some boards, such as the Rock 5B, use a few additional
> discrete voltage regulators instead of a master-slave PMIC
> configuration,
> which may actually introduce some weird power-related issues, which also
> may be intermittent. Actually, I've already overheard that the Rock 5B
> experiences some issues of that nature, but I don't know the details.

Those discrete regulators seem to be out of scope of this discussion.

I agree that a deeper power-cycle with proper power-up sequence to
follow it is better when it's available in the respective hardware.
I'm also happy to provide a follow-up patch to switch from CRU to PMIC
resets for the boards I found to support the latter.

The question we have at hand is solely about the default behavior for
a hypothetical new board with minimal .dts, or an existing board where
we can't determine the wiring of the TSHUT signal:
Option 1. Let them stay nice and warm at 120C+ under load, because
they should have known better and should have enabled the TSADC in
their device tree before putting the system under load
Option 2. Get them passively cooled at 85C under load even with no
heatsink, then force a CRU reset out of abundance of caution at 120C
unless they defined PMIC reset in their device tree

I'm advocating for the latter.

FWIW, the CRU reset is what the kernel uses for rebooting the system,
either during a reboot or a kernel panic. So it is already used for both
normal and abnormal scenarios. And yes, it sometimes leaves regulators
or other parts of the system in some weird state that the BROM isn't
expecting.

According to drivers/mfd/rk8xx-core.c, some PMICs (RK809 and RK817, to
be precise) already support taking over the board resets when configured
with "rockchip,system-power-controller". Perhaps we should do the same
with the RK806, to avoid any possible issues with CRU-based board resets;
I'll see to investigate that further.

Not all Rockchip PMICs (RK808, for example) support software-initiated
resets, unfortunately. According to the RK806 datasheet, it seems capable
of that; see pages 27 and 28 in the version 1.0 of the datasheet.

Why should a hardware triggered reset be any different?

According to the RK806 datasheet, resetting through PMIC(s) causes the
PMIC(s) to cut the power rails in a controlled way, i.e. with the expected
ramp-downs and sequencing, and the SoC then wakes up in a state of the
regulators that's exactly the same as when it gets powered up on cold boot.
Doing it that way should be better.

The reset procedure _should_ be virtually the same for all Rockchip PMICs,
but please don't take my word on that. Resets are described quite poorly
in some PMIC datasheets.