Re: [PATCH v2 0/4] PM: domains: Avoid boilerplate code for DVFS in subsystem/drivers

From: Ulf Hansson
Date: Fri Jun 04 2021 - 06:59:54 EST


On Fri, 4 Jun 2021 at 10:23, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote:
>
> On Fri, Jun 04, 2021 at 09:18:45AM +0200, Ulf Hansson wrote:
> > On Thu, 3 Jun 2021 at 19:16, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 03, 2021 at 05:27:30PM +0200, Ulf Hansson wrote:
> > > > On Thu, 3 Jun 2021 at 13:13, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote:
> > > > > I think this might also go into the direction of my problem with the OPP
> > > > > core for CPU DVFS [1] since the OPP core currently does not "power-on"
> > > > > the power domains, it just sets a performance state. I got kind of stuck
> > > > > with all the complexity of power domains in Linux so I think we never
> > > > > solved that.
> > > >
> > > > Hmm, that issue is in a way related.
> > > >
> > > > Although, if I understand correctly, that was rather about at what
> > > > layer it makes best sense to activate the device (from runtime PM
> > > > point of view). And this was needed due to the fact that the
> > > > corresponding genpd provider, requires the PM domain to be power on to
> > > > allow changing a performance state for it. Did I get that correct?
> > > >
> > >
> > > Yes, mostly. But I guess I keep coming back to the same question:
> > >
> > > When/why does it make sense to vote for a "performance state" of
> > > a power domain that is or might be powered off?
> > >
> > > "Powered off" sounds like the absolutely lowest possible performance
> > > state to me, it's just not on at all. And if suddenly a device comes and
> > > says "I want performance state X", nothing can change until the power
> > > domain is also "powered on".
> > >
> > > I think my "CPU DVFS" problem only exists because in many other
> > > situations it's possible to rely on one of the following side effects:
> > >
> > > 1. The genpd provider does not care if it's powered on or not.
> > > (i.e. it's always-on or implicitly powers on if state > 0).
> > > 2. There is some other device that votes to keep the power domain on.
> > >
> > > And that's how the problem relates to my comment for this patch series ...
> > >
> > > >
> > > > >
> > > > > Do I understand your patch set correctly that you basically make the
> > > > > performance state votes conditional to the "power-on" vote of the device
> > > > > (which is automatically toggled during runtime/system PM)?
> > > >
> > > > The series can be considered as a step in that direction, but no, this
> > > > series doesn't change that behaviour.
> > > >
> > > > Users of dev_pm_genpd_set_performance_state() are still free to set a
> > > > performance state, orthogonally to whether the PM domain is powered on
> > > > or off.
> > > >
> > > > >
> > > > > If yes, I think that's a good thing. It was always really confusing to me
> > > > > that a device can make performance state votes if it doesn't actually
> > > > > want the power domain to be powered on.
> > > >
> > > > I share your view, it's a bit confusing.
> > > >
> > > > Just adding the condition internally to genpd to prevent the caller of
> > > > dev_pm_genpd_set_performance() from succeeding to set a new state,
> > > > unless the genpd is powered on, should be a rather simple thing to
> > > > add.
> > > >
> > > > However, to change this, we first need to double check that all the
> > > > callers are making sure they have turned on the PM domain (typically
> > > > via runtime PM).
> > > >
> > >
> > > ... because if performance state votes would be conditional to the
> > > "power-on" vote of the device, it would no longer be possible
> > > to rely on the side effects mentioned above. So this would most
> > > certainly break some code that (incorrectly?) relies on these side
> > > effects, but would also prevent such code.
> >
> > Right. I understand your point and I am open to discuss an
> > implementation. Although, I suggest we continue that separately from
> > the $subject series.
> >
> > >
> > > My (personal) feeling so far is that just dropping performance votes
> > > during runtime/system suspend just makes the entire situation even more
> > > confusing.
> >
> > Well, that's what most subsystems/drivers need to do.
> >
> > Moreover, we have specific devices that only use one default OPP [1].
> >
> > >
> > > > >
> > > > > What happens if a driver calls dev_pm_genpd_set_performance_state(...)
> > > > > while the device is suspended? Will that mess up the performance state
> > > > > when the device resumes?
> > > >
> > > > Good question. The idea is:
> > > >
> > > > If genpd in genpd_runtime_suspend() are able to drop an existing vote
> > > > for a performance state, it should restore the vote in
> > > > genpd_runtime_resume(). This also means, if there is no vote to drop
> > > > in genpd_runtime_suspend(), genpd should just leave the vote as is in
> > > > genpd_runtime_resume().
> > > >
> > >
> > > But the next time the device enters runtime suspend that vote would be
> > > dropped, wouldn't it? That feels kind of strange to me.
> >
> > What do you mean by "next time"?
> >
>
> Basically just like:
>
> <device runtime-suspended>
> driver does dev_pm_genpd_set_performance_state(...)
> - performance state is applied immediately, even though device does
> apparently not actually want the power domain to be powered on
> <device runtime resumes>
> - performance state is kept
> <device runtime suspends>
> - performance state is dropped

Yep, this is what would happen.

> ...
>
> I'm not saying this example makes sense (it doesn't for me). It doesn't
> make sense to vote for a performance state while runtime suspended.
>
> But with this patch series we still allow that, and it will kind of
> produce inconsistent behavior that the performance state is applied
> immediately, even though the device is currently runtime-suspended.
> But once it runtime suspends again, suddenly it is dropped.

Yes.

Note that, I have been looking at the existing callers of
dev_pm_genpd_set_performance_state() in the kernel as of today. It
should not be an issue, at least as far as I can tell.

>
> And when you say:
>
> > My main point is, if the device enters runtime suspend state, why
> > should we keep the vote for an OPP for the device? I mean, the device
> > isn't going to be used anyway.
> >
>
> A very similar point would be: "If the device *is* in runtime suspend
> state, why should we take a vote for an OPP for the device?"
>
> But I understand that this might be something we should address
> separately in a follow-up patch/discussion. Don't get me wrong, I agree
> this patch set is good, I just think we should go one step further and
> finally make this consistent and less prone to side effects.

I agree. We should look into how to change the behaviour. I intend to
have a look at it in a while.

>
> A good first step might be something like a WARN_ON_ONCE(...) if a
> device tries to vote for a performance state while runtime suspended.
> Then we might get a clearer picture which drivers do that currently.

That's an idea we could try, even if the number of users are quite
limited today. I can try the "git grep" analyze-method, I will
probably find most of them.

>
> Stephan

That said, are you okay that we move forward with the $subject series
(except patch4)?

Kind regards
Uffe