Re: [PATCH v3 0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

From: Dragan Simic
Date: Wed Mar 13 2024 - 13:40:21 EST


Hello Sebastian,

On 2024-03-13 17:39, Sebastian Reichel wrote:
On Thu, Mar 07, 2024 at 11:16:20PM +0100, Sebastian Reichel wrote:
On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@xxxxxxxxx> wrote:
> > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> > <sebastian.reichel@xxxxxxxxxxxxx> wrote:
> > > I'm too busy to have a detailed review of this series right now, but
> > > I pushed it to our CI and it results in a board reset at boot time:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> > >
> > > I also pushed just the first three patches (i.e. without OPP /
> > > cpufreq) and that boots fine:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
> >
> > Thank you for testing these! I've noticed in the boot log that the CI
> > machine uses some u-boot 2023.07 - is that a downstream one? Any
> > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> > integration tree?
> >
> > I use 2023.11 from your integration tree, with a binary bl31, and I'm
> > not getting those resets even under prolonged heavy load (I rebuild
> > Chromium with 8 concurrent compilation jobs as the stress test -
> > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> > to understand if it's just a 'lucky' SoC specimen on my side, or if
> > there is some dark magic happening differently on my machine vs. your
> > CI machine.
> >
> > Thinking that maybe if your CI machine uses a downstream u-boot it
> > might be leaving some extra hardware running (PVTM?) which might do
> > weird stuff when TSADC/clocks/voltages get readjusted by the generic
> > cpufreq driver?..
> >
> > > Note, that OPP / cpufreq works on the same boards in the CI when
> > > using the ugly-and-not-for-upstream cpufreq driver:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> > >
> > > My best guess right now is, that this is related to the generic
> > > driver obviously not updating the GRF read margin registers.
> >
> > If it was about memory read margins I believe I would have been
> > unlikely to get my machine to work reliably under heavy load with the
> > default ones, but who knows...
>
> Sebastian's report led me to investigate further how all those things
> are organized in the downstream code and in hardware, and what could
> be a pragmatic way forward with upstream enablement. It turned out to
> be quite a rabbit hole frankly, with multiple layers of abstraction
> and intertwined code in different places.
>
> Here's a quick summary for future reference:
> - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> which provides an SCMI service to expose them to the kernel
> - ATF itself doesn't directly set any clock frequencies. Instead, it
> accepts a target frequency via SCMI and converts it into an oscillator
> ring length setting for the PVPLL hardware block (via a fixed table
> lookup). At least that's how it's done in the recently released TF-A
> bl31 code [1] - perhaps the binary bl31 does something similar
> - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
> - PVPLL produces a reference clock to feed to the CPUs, which depends
> on the configured oscillator ring length but also on the supply
> voltage, silicon quality and perhaps temperature too. ATF doesn't know
> anything about voltages or temperatures, so it doesn't guarantee that
> the requested frequency is matched by the hardware
> - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> which case the target frequency is directly fed by the ATF to the CRU.
> This happens for both big-core and little-core frequencies below 816
> MHz
> - Given that requesting a particular frequency via SCMI doesn't
> guarantee that it will be what the CPUs end up running at, the vendor
> kernel also does a runtime voltage calibration for the supply
> regulators, by adjusting the supply voltage in minimum regulator steps
> until the frequency reported by PVPLL gets close to the requested one
> [2]. It then overwrites OPP provided voltage values with the
> calibrated ones
> - There's also some trickery with preselecting OPP voltage sets using
> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> value stored in an NVMEM cell and/or the PVTM frequency generated at a
> reference "midpoint" OPP [3]. Better performing silicon gets to run at
> lower default supply voltages, thus saving power
> - Once the OPPs are selected and calibrated, the only remaining
> trickery is the two supply regulators per each CPU cluster (one for
> the CPUs and the other for the memory interface)
> - Another catch, as Sebastian points out, is that memory read margins
> must be adjusted whenever the memory interface supply voltage crosses
> certain thresholds [4]. This has little to do with CPUs or
> frequencies, and is only tangentially related to them due to the
> dependency chain between the target CPU frequency -> required CPU
> supply voltage -> matching memory interface supply voltage -> required
> read margins
> - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> setting it to anything in kernel code (as the downstream driver does)
> seems redundant
>
> All in all, it does indeed sound like Collabora's CI machine boot-time
> resets are most likely caused by the missing memory read margin
> settings in my patch series. Voltage values in the OPPs I used are the
> most conservative defaults of what the downstream DT has, and PVPLL
> should be able to generate reasonable clock speeds with those (albeit
> likely suboptimal, due to them not being tuned to the particular
> silicon specimen). And there is little else to differ frankly.
>
> As for the way forward, it would be great to know the opinions from
> the list. My thinking is as follows:
> - I can introduce memory read margin updates as the first priority,
> leaving voltage calibration and/or OPP preselection for later (as
> those should not affect system stability at current default values,
> perhaps only power efficiency to a certain extent)
> - CPUfreq doesn't sound like the right place for those, given that
> they have little to do with either CPU or freq :)
> - I suggest a custom regulator config helper to plug into the OPP
> layer, as is done for TI OMAP5 [6]. At first, it might be only used
> for looking up and setting the correct memory read margin value
> whenever the cluster supply voltage changes, and later the same code
> can be extended to do voltage calibration. In fact, OMAP code is there
> for a very similar purpose, but in their case optimized voltages are
> pre-programmed in efuses and don't require runtime recalibration
> - Given that all OPPs in the downstream kernel list identical
> voltages for the memory supply as for the CPU supply, I don't think it
> makes much sense to customize the cpufreq driver per se.
> Single-regulator approach with the generic cpufreq-dt and regulator
> coupling sounds much less invasive and thus lower-maintenance

Sorry for my late response.

When doing some more tests I noticed, that the CI never build the
custom driver and thus never did any CPU frequency scaling at all.
I only used it for my own tests (on RK3588 EVB1). When enabling the
custom driver, the CI has the same issues as your series. So my
message was completely wrong, sorry about that.

Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
the last part is the git hash. This is the exact U-Boot source tree
being used:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/

This was one of the first U-Boot trees with Rock 5B Ethernet support
and is currently flashed to the SPI flash memory of the CI boards.
The vendor U-Boot tree is a lot older. Also it is still using the
Rockchip binary BL31. We have plans to also CI boot test U-Boot,
but currently nobody has time to work on this. I don't think there should
be any relevant changes between upstream 2023.07 and 2023.11 that could
explain this. But it's the best lead now, so I will try to find some time
for doing further tests related to this in the next days.

Regarding the voltage calibration - One option would be to do this
calibration at boot time (i.e. in U-Boot) and update the voltages
in DT accordingly.

After some more debugging I finally found the root cause. The CI
boards were powered from a USB hub using a USB-A to USB-C cable, so
that the team could access maskrom mode. Since I was not involved in
setting them up, I was not aware of that. It effectively limits the
power draw to 500 or 900 mA (depending on USB port implementation),
which is not enough to power the board with the higher frequencies.
The KernelCI Rock 5B boards are now switched to proper power
supplies and the issues are gone.

Sorry for the false alarm,

Great to know, thanks for the clarification.