Re: [PATCH 1/4] cpufreq: qcom-nvmem: Enable virtual power domain devices

From: Stephan Gerhold
Date: Tue Oct 17 2023 - 16:54:15 EST


On Mon, Oct 16, 2023 at 04:47:52PM +0200, Ulf Hansson wrote:
> [...]
> > > >
> > > > Here are the two commits with the my current DT changes (WIP):
> > > > - MSM8909+PM8909 (RPMPD only):
> > > > https://github.com/msm8916-mainline/linux/commit/791e0c5a3162372a0738bc7b0f4a5e87247923db
> > >
> > > Okay, so this looks pretty straight forward. One thing though, it
> > > looks like we need to update the DT bindings for cpus.
> > >
> > > I recently updated Documentation/devicetree/bindings/arm/cpus.yaml
> > > [1], to let "perf" be the common "power-domain-name" for a CPU's SCMI
> > > performance-domain. I look like we should extend the description to
> > > allow "perf" to be used for all types of performance domains.
> > >
> >
> > "perf" sounds fine for a single power domain, I just used "apc" here for
> > consistency with the MSM8916 changes (which scales this power domain and
> > several others, as you saw).
> >
> > (BTW, I would appreciate such a generic name for the cpuidle case as
> > well, so "idle" instead of "psci" vs "sbi". I have another WIP cpuidle
> > driver and didn't want to invent another name there...)
>
> Whether it's "idle" or "power" or something else, we should certainly
> avoid a provider specific (psci) name, as has been pointed out earlier
> by Rob too.
>
> I will try to get some time to update the DT docs as soon as I can.
> Unless you get to it first, feel free to do it.
>

Thanks! I'm not sure either when I will have time to get back to the
cpuidle driver, so let's just see who finds time first. :D

> > [MSM8916 setup with multiple power domains...]
> > There seems to be indeed some kind of relation between MX <=> CX/APC:
> >
> > - When voting for CX in the RPM firmware, it will always implicitly
> > adjust the MX performance state to be MX >= CX.
> >
> > - When scaling APC up, we must increase MX before APC.
> > - When scaling APC down, we must decrease MX after APC.
> > => Clearly MX >= APC. Not in terms of raw voltage, but at least for the
> > abstract performance state.
> >
> > Is this some kind of parent-child relationship between MX <=> CX and
> > MX <=> APC?
>
> Thanks for sharing the above. Yes, to me, it looks like there is a
> parent/child-domain relationship that could be worth describing/using.
>
> >
> > If yes, maybe we could indeed bind MX to the CPR genpd somehow. They use
> > different performance state numbering, so we need some kind of
> > translation. I'm not entirely sure how that would be described.
>
> Both the power-domain and the required-opps DT bindings
> (Documentation/devicetree/bindings/opp/opp-v2-base.yaml) are already
> allowing us to describe these kinds of hierarchical
> dependencies/layouts.
>
> In other words, to scale performance for a child domain, the child may
> rely on that we scale performance for the parent domain too. This is
> already supported by genpd and through the opp library - so it should
> just work. :-)
>

Oh! I have looked at that code in the genpd core already a few times but
until now I never understood how it works. That's great, thanks!

I will test this and get back to you separately.

Seems like we reached a conclusion for enabling the power domains at
least, which will already help me a lot with MSM8909. :-)

> [...]
>
> > >
> > > *) The approach you have taken in the $subject patch with the call to
> > > pm_runtime_resume_and_get() works as a fix for QCS404, as there is
> > > only the CPR to attach to. The problem with it, is that it doesn't
> > > work for cases where the RPMPD is used for performance scaling, either
> > > separate or in combination with the CPR. It would keep the rpmpd:s
> > > powered-on, which would be wrong. In regards to the
> > > dev_pm_syscore_device() thingy, this should not be needed, as long as
> > > we keep the vdd-apc-supply enabled, right?
> > >
> > > **) A more generic solution, that would work for all cases (even
> > > when/if hooking up the CPR to the rpmpd:s), consists of tweaking genpd
> > > to avoid "caching" performance states for these kinds of devices. And
> > > again, I don't see that we need dev_pm_syscore_device(), assuming we
> > > manage the vdd-apc-supply correctly.
> > >
> > > Did I miss anything?
> > >
> >
> > We do need to keep the CPU-related RPMPDs always-on too.
> >
> > Keeping the CPU-related RPMPDs always-on is a bit counter-intuitive, but
> > it's because of this:
> >
> > > > > > - RPMPD: This is the generic driver for all the SoC power domains
> > > > > > managed by the RPM firmware. It's not CPU-specific. However, as
> > > > > > special feature each power domain is exposed twice in Linux, e.g.
> > > > > > "MSM8909_VDDCX" and "MSM8909_VDDCX_AO". The _AO ("active-only")
> > > > > > variant tells the RPM firmware that the performance/enable vote only
> > > > > > applies when the CPU is active (not in deep cpuidle state).
> >
> > The CPU only uses the "_AO"/active-only variants in RPMPD. Keeping these
> > always-on effectively means "keep the power domain on as long as the CPU
> > is active".
> >
> > I hope that clears up some of the confusion. :)
>
> Yes, it does, thanks! Is the below the correct conclusion for how we
> could move forward then?
>
> *) The pm_runtime_resume_and_get() works for QCS404 as a fix. It also
> works fine when there is only one RPMPD that manages the performance
> scaling.
>

Agreed.

> **) In cases where we have multiple PM domains to scale performance
> for, using pm_runtime_resume_and_get() would work fine too. Possibly
> we want to use device_link_add() to set up suppliers, to avoid calling
> pm_runtime_resume_and_get() for each and every device.
>

Hm. What would you use as "supplied" device? The CPU device I guess?

I'm looking again at my old patch from 2020 where I implemented this
with device links in the OPP core. Seems like you suggested this back
then too :)

https://lore.kernel.org/linux-pm/20200826093328.88268-1-stephan@xxxxxxxxxxx/

However, for the special case of the CPU I think we don't gain any code
simplification from using device links. There will just be a single
resume of each virtual genpd device, as well as one put during remove().
Exactly the same applies when using device links, we need to set up the
device links once for each virtual genpd device, and clean them up again
during remove().

Or can you think of another advantage of using device links?

> ***) Due to the above, we don't need a new mechanism to avoid
> "caching" performance states for genpd. At least for the time being.
>

Right. Given *) and **) I'll prepare a v2 of $subject patch with the
remove() cleanup fixed and an improved commit description.

I'll wait for a bit in case you have more thoughts about the device
links.

Thanks!
Stephan