Re: [PATCH 2/2] PM: sleep: Fix runtime PM based cpuidle support

From: Ulf Hansson
Date: Thu Oct 21 2021 - 07:49:25 EST


On Wed, 20 Oct 2021 at 20:18, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Wed, Sep 29, 2021 at 4:44 PM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> >
> > In the cpuidle-psci case, runtime PM in combination with the generic PM
> > domain (genpd), may be used when entering/exiting an idlestate. More
> > precisely, genpd relies on runtime PM to be enabled for the attached device
> > (in this case it belongs to a CPU), to properly manage the reference
> > counting of its PM domain.
> >
> > This works fine most of the time, but during system suspend in the
> > dpm_suspend_late() phase, the PM core disables runtime PM for all devices.
> > Beyond this point and until runtime PM becomes re-enabled in the
> > dpm_resume_early() phase, calls to pm_runtime_get|put*() will fail.
> >
> > To make sure the reference counting in genpd becomes correct, we need to
> > prevent cpuidle-psci from using runtime PM when it has been disabled for
> > the device. Therefore, let's move the call to cpuidle_pause() from
> > dpm_suspend_noirq() to dpm_suspend_late() - and cpuidle_resume() from
> > dpm_resume_noirq() into dpm_resume_early().
> >
> > Diagnosed-by: Maulik Shah <mkshah@xxxxxxxxxxxxxx>
> > Suggested-by: Maulik Shah <mkshah@xxxxxxxxxxxxxx>
> > Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> > ---
> > drivers/base/power/main.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index cbea78e79f3d..1c753b651272 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -747,8 +747,6 @@ void dpm_resume_noirq(pm_message_t state)
> >
> > resume_device_irqs();
> > device_wakeup_disarm_wake_irqs();
> > -
> > - cpuidle_resume();
> > }
> >
> > /**
> > @@ -870,6 +868,7 @@ void dpm_resume_early(pm_message_t state)
> > }
> > mutex_unlock(&dpm_list_mtx);
> > async_synchronize_full();
> > + cpuidle_resume();
> > dpm_show_time(starttime, state, 0, "early");
> > trace_suspend_resume(TPS("dpm_resume_early"), state.event, false);
> > }
> > @@ -1336,8 +1335,6 @@ int dpm_suspend_noirq(pm_message_t state)
> > {
> > int ret;
> >
> > - cpuidle_pause();
> > -
> > device_wakeup_arm_wake_irqs();
> > suspend_device_irqs();
> >
> > @@ -1467,6 +1464,7 @@ int dpm_suspend_late(pm_message_t state)
> > int error = 0;
> >
> > trace_suspend_resume(TPS("dpm_suspend_late"), state.event, true);
> > + cpuidle_pause();
> > mutex_lock(&dpm_list_mtx);
> > pm_transition = state;
> > async_error = 0;
> > --
>
> Well, this is somewhat heavy-handed and it affects even the systems
> that don't really need to pause cpuidle at all in the suspend path.

Yes, I agree.

Although, I am not really changing the behaviour in regards to this.
cpuidle_pause() is already being called in dpm_suspend_noirq(), for
everybody today.

>
> Also, IIUC you don't need to pause cpuidle completely, but make it
> temporarily avoid idle states potentially affected by this issue. An
> additional CPUIDLE_STATE_DISABLED_ flag could be used for that I
> suppose and it could be set via cpuidle_suspend() called from the core
> next to cpufreq_suspend().

cpuidle_suspend() would then need to go and fetch the cpuidle driver
instance, which in some cases is one driver per CPU. Doesn't that get
rather messy?

Additionally, since find_deepest_state() is being called for
cpuidle_enter_s2idle() too, we would need to treat the new
CPUIDLE_STATE_DISABLED_ flag in a special way, right?

Is this really what we want?

>
> The other guys who rely on the cpuidle pausing today could be switched
> over to this new mechanism later and it would be possible to get rid
> of the pausing from the system suspend path completely.

Avoiding to pause cpuidle when it's not needed makes perfect sense.
Although, it looks to me that we could also implement that on top of
$subject patch.

Unless you insist on the CPUIDLE_STATE_DISABLED_ way, I would probably
explore an option to let a cpuidle driver to set a global cpuidle flag
during ->probe(). Depending if this flag is set, we can simply skip
calling cpuidle_pause() during system suspend.

What do you think?

Kind regards
Uffe