Re: [PATCH v2 2/3] cpufreq: qcom-nvmem: Enable virtual power domain devices

From: Stephan Gerhold
Date: Thu Oct 19 2023 - 10:49:08 EST


On Thu, Oct 19, 2023 at 04:12:56PM +0200, Ulf Hansson wrote:
> On Thu, 19 Oct 2023 at 15:05, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote:
> > On Thu, Oct 19, 2023 at 01:26:19PM +0200, Ulf Hansson wrote:
> > > On Thu, 19 Oct 2023 at 12:24, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> > > > On Wed, 18 Oct 2023 at 10:06, Stephan Gerhold
> > > > <stephan.gerhold@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > The genpd core caches performance state votes from devices that are
> > > > > runtime suspended as of commit 3c5a272202c2 ("PM: domains: Improve
> > > > > runtime PM performance state handling"). They get applied once the
> > > > > device becomes active again.
> > > > >
> > > > > To attach the power domains needed by qcom-cpufreq-nvmem the OPP core
> > > > > calls genpd_dev_pm_attach_by_id(). This results in "virtual" dummy
> > > > > devices that use runtime PM only to control the enable and performance
> > > > > state for the attached power domain.
> > > > >
> > > > > However, at the moment nothing ever resumes the virtual devices created
> > > > > for qcom-cpufreq-nvmem. They remain permanently runtime suspended. This
> > > > > means that performance state votes made during cpufreq scaling get
> > > > > always cached and never applied to the hardware.
> > > > >
> > > > > Fix this by enabling the devices after attaching them and use
> > > > > dev_pm_syscore_device() to ensure the power domains also stay on when
> > > > > going to suspend. Since it supplies the CPU we can never turn it off
> > > > > from Linux. There are other mechanisms to turn it off when needed,
> > > > > usually in the RPM firmware (RPMPD) or the cpuidle path (CPR genpd).
> > > >
> > > > I believe we discussed using dev_pm_syscore_device() for the previous
> > > > version. It's not intended to be used for things like the above.
> > > >
> >
> > Sorry, looks like we still had a misunderstanding in the conclusion of
> > the previous discussion. :')
> >
> > > > Moreover, I was under the impression that it wasn't really needed. In
> > > > fact, I would think that this actually breaks things for system
> > > > suspend/resume, as in this case the cpr driver's genpd
> > > > ->power_on|off() callbacks are no longer getting called due this,
> > > > which means that the cpr state machine isn't going to be restored
> > > > properly. Or did I get this wrong?
> > >
> >
> > We strictly need the RPMPDs to be always-on, also across system suspend
> > [1]. The RPM firmware will drop the votes internally as soon as the
> > CPU(s) have entered deep cpuidle. We can't do this from Linux, because
> > we need the CPU to continue running until it was shut down cleanly.
> >
> > For CPR, we strictly need the backing regulator to be always-on, also
> > across system suspend. Typically the hardware will turn off the
> > regulator as soon as the CPU(s) enter deep cpuidle. Similarly, we can't
> > do this from Linux, because we need the CPU to continue running until it
> > was shut down cleanly.
> >
> > My understanding was that we're going to pause the CPR state machine
> > using the system suspend/resume callbacks on the driver, instead of
> > using the genpd->power_on|off() callbacks [2]. I can submit a separate
> > patch for this.
>
> If we are going to do 1) as described below, this looks to me that
> it's going to be needed.
>

Yep.

> How will otherwise the cpr state machine be saved/restored during
> system suspend/resume? Note that, beyond 1), the genpd's
> ->power_on|off() callbacks are no longer going to be called during
> system suspend/resume.
>

(Side note: I think "save/restore" might be the wrong words for
suspend/resume of CPR. Looking at the code most of the configuration
appears to be preserved across suspend/resume. Nothing is saved, it
literally just disables the state machine during suspend and re-enables
it during resume.

I'm not entirely sure what's the reason for doing this. Perhaps the
main goal is just to prevent the CPR state machine from getting stuck
or sending pointless IRQs that won't be handled while Linux is
suspended.)

> In a way this also means that the cpr genpd provider might as well
> also have GENPD_FLAG_ALWAYS_ON set for it.

Conceptually I would consider CPR to be a generic power domain provider
that could supply any kind of device. I know at least of CPUs and GPUs.
We need "always-on" only for the CPU, but not necessarily for other
devices.

For a GPU, the Linux driver (running on the CPU) can stop the GPU, wait
for completion and then invoke the ->power_off() callback of CPR. In
that case it is also safe to disable the backing regulator from Linux.
(I briefly mentioned this already in the previous discussion I think.)

We could set GENPD_FLAG_ALWAYS_ON for the CPR compatibles where we know
that they are only used to supply CPUs, but if we're going to do (1)
anyway there might not be much of an advantage for the extra complexity.

>
> >
> > I didn't prioritize this because QCS404 (as the only current user of
> > CPR) doesn't have proper deep cpuidle/power management set up yet. It's
> > not entirely clear to me if there is any advantage (or perhaps even
> > disadvantage) if we pause the CPR state machine while the shared L2
> > cache is still being actively powered by the CPR power rail during
> > system suspend. I suspect this is a configuration that was never
> > considered in the hardware design.
>
> I see.
>
> >
> > Given the strict requirement for the RPMPDs, I only see two options:
> >
> > 1. Have an always-on consumer that prevents the power domains to be
> > powered off during system suspend. This is what this patch tries to
> > achieve.
> >
> > Or:
> >
> > 2. Come up with a way to register the RPMPDs used by the CPU with
> > GENPD_FLAG_ALWAYS_ON. This would also be doable, but isn't as
> > straightfoward as "regulator-always-on" in the DT because the rpmpd
> > DT node represents multiple genpds in a single DT node [3].
>
> Yes, it sounds like it may be easier to do 1).
>
> >
> > What do you think? Do you see some other solution perhaps? I hope we can
> > clear up the misunderstanding. :-)
>
> Yes, thanks!
>
> >
> > [1]: https://lore.kernel.org/linux-arm-msm/ZQGqfMigCFZP_HLA@xxxxxxxxxxx/
> > [2]: https://lore.kernel.org/linux-arm-msm/CAPDyKFoiup8KNv=1LFGKDdDLA1pHsdJUgTTWMdgxnikEmReXzg@xxxxxxxxxxxxxx/
> > [3]: https://lore.kernel.org/linux-arm-msm/ZSg-XtwMxg3_fWxc@xxxxxxxxxxx/
> >
> > > BTW, if you really need something like the above, the proper way to do
> > > it would instead be to call device_set_awake_path() for the device.
> > >
> > > This informs genpd that the device needs to stay powered-on during
> > > system suspend (assuming that GENPD_FLAG_ACTIVE_WAKEUP has been set
> > > for it), hence it will keep the corresponding PM domain powered-on
> > > too.
> > >
> >
> > Thanks, I can try if this works as alternative to the
> > dev_pm_syscore_device()!
>
> Yes, please. We don't want to abuse the dev_pm_syscore_device() thingy.
>

Could you clarify the idea behind GENPD_FLAG_ACTIVE_WAKEUP? Would I set
it conditionally for all RPMPDs or just the ones consumed by the CPU?
How does the genpd *provider* know if one of its *consumer* devices
needs to have its power domain kept on for wakeup?

Thanks!
Stephan