Re: [PATCH] arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones

From: Matthias Kaehlcke
Date: Thu Sep 03 2020 - 08:34:06 EST


Hi Rajendra,

On Thu, Sep 03, 2020 at 11:00:52AM +0530, Rajendra Nayak wrote:
>
> On 9/3/2020 10:14 AM, Rajendra Nayak wrote:
> >
> > On 9/2/2020 9:02 PM, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@xxxxxxxxxxxxxx> wrote:
> > > >
> > > >
> > > > > * In terms of the numbers here, I believe that you're claiming that we
> > > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power.  My memory
> > > > > of how much power we could dissipate in previous laptops I worked on
> > > > > is a little fuzzy, but that doesn't seem insane for a passively-cooled
> > > > > laptop.  However, I think someone could conceivably put this chip in a
> > > > > smaller form factor.  In such a case, it seems like we'd want these
> > > > > things to sum up to ~2000 (if it would ever make sense for someone to
> > > > > put this chip in a phone) or ~4000 (if it would ever make sense for
> > > > > someone to put this chip in a small tablet).  It seems possible that,
> > > > > to achieve this, we might have to tweak the
> > > > > "dynamic-power-coefficient".
> > > >
> > > > DPC values are calculated (at a SoC) by actually measuring max power at various
> > > > frequency/voltage combinations by running things like dhrystone.
> > > > How would the max power a SoC can generate depend on form factors?
> > > > How much it can dissipate sure is, but then I am not super familiar how
> > > > thermal frameworks end up using DPC for calculating power dissipated,
> > > > I am guessing they don't.
> > > >
> > > > > I don't know how much thought was put
> > > > > into those numbers, but the fact that the little cores have a super
> > > > > round 100 for their dynamic-power-coefficient makes me feel like they
> > > > > might have been more schwags than anything.  Rajendra maybe knows?
> > > >
> > > > FWIK, the values are always scaled and normalized to 100 for silver and
> > > > then used to derive the relative DPC number for gold. If you see the DPC
> > > > for silver cores even on sdm845 is a 100.
> > > > Again these are not estimations but based on actual power measurements.
> > >
> > > The scaling to 100 doesn't seem to match how the thermal framework is
> > > using them.  Take a look at of_cpufreq_cooling_register().  It takes
> > > the "dynamic-power-coefficient" and passes it as "capacitance" into
> > > __cpufreq_cooling_register().  That's eventually used to compute
> > > power, which is documented in the code to be in mW.
> > >
> > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv;
> > > do_div(power, 1000000000);
> > >
> > > /* power is stored in mW */
> > > freq_table[i].power = power;
> > >
> > > That's used together with "sustainable-power", which is the attribute
> > > that Matthias is trying to set.  That value is documented to be in mW
> > > as well.
> > >
> > > ...so if the silver cores are always scaled to 100 regardless of how
> > > much power they actually draw then it'll be impossible to actually
> > > think about "sustainable-power" as a mW value.  Presumably we either
> > > need to accept that fact (and ideally document it) or we need to
> > > change the values for silver / gold cores (we could still keep the
> > > relative values the same and just scale them).
> >
> > That sounds reasonable (still keep the relative values and scale them)
> > I'll get back on what those scaled numbers would look like, and try to
> > get some sense of why this scaling to 100 was done (like you said
> > I don't see any documentation on this), but I see atleast a few other non-qcom
> > SoCs doing this too in mainline (like rockchip/rk3399)
>
> On second thoughts, why wouldn't a relative 'sustainable-power' value work?
> On every device, one would need to do the exercise that Matthias did to come
> up with the OPP at which we can sustain max CPU/GPU loads anyway.

You assume that a thermal zone only has cooling devices of a the same type (or
with the same fake unit for power consumption). This falls apart when multiple
types are used, which is common.

Also sustainable power is only a derived value, the lying already starts in
the energy model, which is used by EAS, so a fake unit could cause further
problems.

> I mean even if we do change the DPC values to match actual power, Matthias would
> still observe that we can sustain at the very same OPP and not any different.
> Its just that the mW values that are passed to kernel are relative and not
> absolute. My worry is that perhaps no SoC vendor wants to put these absolute numbers
> out.

This is pretty much 'security' by obscurity. It would be relatively easy to
measure actual power consumption at different CPU speeds and derive the DPC
values from that.