Re: [PATCH v2] opp: Power on (virtual) power domains managed by the OPP core

From: Stephan Gerhold
Date: Fri Sep 11 2020 - 11:45:41 EST


On Fri, Sep 11, 2020 at 02:55:38PM +0530, Viresh Kumar wrote:
> > As mentioned in my other mail [1] it turns out I actually have such a
> > use case. I briefly explained it in [2], basically the clock that
> > provides higher CPU frequencies has some voltage requirements that
> > should be voted for using a power domain.
> >
> > The clock that provides the lower CPU frequencies has no such
> > requirement, so I need to scale (and power on) a power domain only for
> > some of the OPPs.
> >
> > [1]: https://lore.kernel.org/linux-pm/20200831154938.GA33622@xxxxxxxxxxx/
> > [2]: https://lore.kernel.org/linux-arm-msm/20200910162610.GA7008@xxxxxxxxxxx/
> >
> > So I think it would be good to discuss this use case first before we
> > decide on this patch (how to enable power domains managed by the OPP
> > core). I think there are two problems that need to be solved:
> >
> > 1. How can we drop our performance state votes for some of the OPPs?
> > I explained that problem earlier already:
> >
> > >
> > > I was thinking about something like that, but can you actually drop
> > > your performance state vote for one of the power domains using
> > > "required-opps"?
> > >
> > > At the moment it does not seem possible. I tried adding a special OPP
> > > using opp-level = <0> to reference that from required-opps, but the OPP
> > > core does not allow this:
> > >
> > > vddcx: Not all nodes have performance state set (7: 6)
> > > vddcx: Failed to add OPP table for index 0: -2
>
> This should fix it.
>
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 214c1619b445..2483e765318a 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -2117,9 +2117,6 @@ int dev_pm_opp_xlate_performance_state(struct opp_table *src_table,
> int dest_pstate = -EINVAL;
> int i;
>
> - if (!pstate)
> - return 0;
> -
> /*
> * Normally the src_table will have the "required_opps" property set to
> * point to one of the OPPs in the dst_table, but in some cases the
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index e72753be7dc7..1a9cb96484bb 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -842,7 +842,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
> static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
> {
> struct device_node *np;
> - int ret, count = 0, pstate_count = 0;
> + int ret, count = 0;
> struct dev_pm_opp *opp;
>
> /* OPP table is already initialized for the device */
> @@ -876,20 +876,13 @@ static int _of_add_opp_table_v2(struct device *dev, struct opp_table *opp_table)
> goto remove_static_opp;
> }
>
> - list_for_each_entry(opp, &opp_table->opp_list, node)
> - pstate_count += !!opp->pstate;
> -
> - /* Either all or none of the nodes shall have performance state set */
> - if (pstate_count && pstate_count != count) {
> - dev_err(dev, "Not all nodes have performance state set (%d: %d)\n",
> - count, pstate_count);
> - ret = -ENOENT;
> - goto remove_static_opp;
> + list_for_each_entry(opp, &opp_table->opp_list, node) {
> + if (opp->pstate) {
> + opp_table->genpd_performance_state = true;
> + break;
> + }
> }
>
> - if (pstate_count)
> - opp_table->genpd_performance_state = true;
> -
> return 0;
>
> remove_static_opp:
>

Thanks, I will test this early next week!

> > 2. Where/when to enable the power domains: I need to enable the power
> > domain whenever I have a vote for a performance state. In the example
> > above the power domain should get enabled for >= 998 MHz and disabled
> > otherwise.
>

I will answer your questions for my use case, maybe Ulf or someone else
who knows more about power domains can chime in and say more about the
general case:

> - Why do you need to enable these power domains like this ?

At the moment power domains seem to have two independent states:
they can be powered on/off and they can have a performance state.
Both seem to be managed independently, e.g. a power domain can have a
non-zero performance state set but at the same time powered off.

The OPP core only calls dev_pm_genpd_set_performance_state(). No matter
if the power domain is powered on or not this will end up in the
set_performance_state() callback of the power domain driver.

And this is where behavior varies depending on the power domain driver.
Some drivers (e.g. qcom-cpr) will set the performance state even if
powered off, others (in my case: qcom rpmpd) will ignore the performance
state until their power_on() callback is called.

In general, I would expect that a power domain should be powered on
whenever there is at least one performance state vote > 0.
This does not seem to be the case at the moment, though.

> - What will happen if you don't enable them at all ?

If I add the power domains needed for CPUFreq to the CPU OPP table,
without further changes, nothing will power on the power domains.
There is no difference for qcom-cpr, but at least rpmpd will ignore all
the performance state votes.

> - What will happen if you enable them for ever ?
>
> AFAIU, these are kind of virtual domains which are there just to vote in behalf
> of the OS. Only if the accumulated vote is greater than zero, the actual power
> domain will start consuming power. Otherwise it should be disabled.
>
> Or is that wrong ?
>

Right, at least in case of rpmpd the actual power domain is managed by
some "RPM" remote processor which accumulates votes from all the CPUs
running in the SoC.

I believe the power domains in my case are even always-on in the RPM,
but still the RPM interface seems to provide a way to vote for
enabling/disabling the power domains. Sadly it is barely documented,
so I'm not sure what happens if we keep the power domains permanently
powered on. Maybe it will block certain low power modes?

The current implementation in drivers/soc/qcom/rpmpd.c does suggest that
we should vote for "power off" when the power domains are no longer
needed by Linux.

(+CC Bjorn, Andy and linux-arm-msm, maybe they know more...)

Thanks!
Stephan