Re: [PATCH v2] clk: qcom: gdsc: Remove direct runtime PM calls

From: Stephen Boyd
Date: Thu Nov 03 2022 - 12:55:53 EST


Quoting Johan Hovold (2022-11-03 07:31:09)
> On Wed, Nov 02, 2022 at 10:07:17AM -0700, Stephen Boyd wrote:
> > We shouldn't be calling runtime PM APIs from within the genpd
> > enable/disable path for a couple reasons.
> >
> > First, this causes an AA lockdep splat because genpd can call into genpd
> > code again while holding the genpd lock.
> >
> > WARNING: possible recursive locking detected
>
> > Second, this confuses runtime PM on CoachZ for the camera devices by
> > causing the camera clock controller's runtime PM usage_count to go
> > negative after resuming from suspend. This is because runtime PM is
> > being used on the clock controller while runtime PM is disabled for the
> > device.
> >
> > The reason for the negative count is because a GDSC is represented as a
> > genpd and each genpd that is attached to a device is resumed during the
> > noirq phase of system wide suspend/resume (see the noirq suspend ops
> > assignment in pm_genpd_init() for more details). The camera GDSCs are
> > attached to camera devices with the 'power-domains' property in DT.
> > Every device has runtime PM disabled in the late system suspend phase
> > via __device_suspend_late(). Runtime PM is not usable until runtime PM
> > is enabled in device_resume_early(). The noirq phases run after the
> > 'late' and before the 'early' phase of suspend/resume. When the genpds
> > are resumed in genpd_resume_noirq(), we call down into gdsc_enable()
> > that calls pm_runtime_resume_and_get() and that returns -EACCES to
> > indicate failure to resume because runtime PM is disabled for all
> > devices.
>
> Probably worth mentioning the fact that those runtime PM calls
> unconditionally failing during resume means that the GDSCs are never
> even enabled.
>
> Seems like the PM runtime usage counters would still be balanced after
> this though as they are decremented also on failure during suspend (i.e.
> domain remains off and no usage counter is incremented during resume).
>

I'm seeing negative usage counts.

> But this is clearly just very broken.
>
> > Upon closer inspection, calling runtime PM APIs like this in the GDSC
> > driver doesn't make sense. It was intended to make sure the GDSC for the
> > clock controller providing other GDSCs was enabled, specifically the
> > MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
> > that GDSC register accesses succeeded. That will already happen because
> > we make the 'dev->pm_domain' a parent domain of each GDSC we register in
> > gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
> > are accessed, we'll enable the parent domain (in this specific case
> > MMCX).
> >
> > We also remove any getting of runtime PM during registration, because
> > when a genpd is registered it increments the count on the parent if the
> > genpd itself is already enabled. And finally, the runtime PM state of
> > the clk controller registering the GDSC shouldn't matter to the
> > subdomain setup. Therefore we always assign 'dev' unconditionally so
> > when GDSCs are removed we properly unlink the GDSC from the clk
> > controller's pm_domain.
>
> This last bit makes no sense as 'dev' was only used for the runtime PM
> management and should be removed by this patch.

Oh sheesh, the name 'gdsc_register()' really throws me off. I think it's
one gdsc being registered, but it's actually plural gdscs, sigh. I will
fix it.