Re: [PATCH v2 3/6] drm/panfrost: Implement ability to turn on/off GPU clocks in suspend

From: Chen-Yu Tsai
Date: Fri Nov 03 2023 - 04:58:59 EST


On Fri, Nov 3, 2023 at 4:54 PM AngeloGioacchino Del Regno
<angelogioacchino.delregno@xxxxxxxxxxxxx> wrote:
>
> Il 03/11/23 06:12, Chen-Yu Tsai ha scritto:
> > On Thu, Nov 2, 2023 at 10:26 PM AngeloGioacchino Del Regno
> > <angelogioacchino.delregno@xxxxxxxxxxxxx> wrote:
> >>
> >> Currently, the GPU is being internally powered off for runtime suspend
> >> and turned back on for runtime resume through commands sent to it, but
> >> note that the GPU doesn't need to be clocked during the poweroff state,
> >> hence it is possible to save some power on selected platforms.
> >>
> >> Add suspend and resume handlers for full system sleep and then add
> >> a new panfrost_gpu_pm enumeration and a pm_features variable in the
> >> panfrost_compatible structure: BIT(GPU_PM_CLK_DIS) will be used to
> >> enable this power saving technique only on SoCs that are able to
> >> safely use it.
> >>
> >> Note that this was implemented only for the system sleep case and not
> >> for runtime PM because testing on one of my MediaTek platforms showed
> >> issues when turning on and off clocks aggressively (in PM runtime)
> >> resulting in a full system lockup.
> >>
> >> Doing this only for full system sleep never showed issues during my
> >> testing by suspending and resuming the system continuously for more
> >> than 100 cycles.
> >>
> >> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
> >> ---
> >>
> >> Note: Even after fixing the panfrost_power_off() function, I'm still
> >> getting issues with turning off the clocks at .runtime_suspend() but
> >> this time, instead of getting a GPU lockup, the entire SoC will deadlock
> >> bringing down the entire system with it (so it's even worst!) :-)
> >
> > IIRC the power domain controller also manages some bus isolation bits
> > that prevent SoC lockup when the clock is disabled. Would reversing
> > the runtime PM calls and the clock calls help?
> >
>
> Thanks for the reminder, but I tested that already... that doesn't work.
>
> There's one more thing I tried: on the MFG iospace, there are debug registers
> that you can poll to check if all bus transactions are finished (so, if the bus
> is idle).
> During local testing, I even hacked in that, and even with the actual bus being
> completely idle, it still freezes... and also checked some more in downstream
> code (for Dimensity 9200, kernel 5.10) if there was any other "trick" that I
> could make use of, but to no avail.
>
> I'd propose to get at least this power saving upstreamed, then perhaps in the
> future we can somehow revisit this to implement some more aggressive power
> management code?
> We're still getting a generous power saving with this one, I'd say...
>
> Anyway, I expect us to be effectively able to be more aggressive here, but I
> also expect that to take quite a bit of time (and probably some help from
> MediaTek as well)...

Sounds good to me.

> Angelo
>
> > ChenYu
> >
> >> drivers/gpu/drm/panfrost/panfrost_device.c | 61 ++++++++++++++++++++--
> >> drivers/gpu/drm/panfrost/panfrost_device.h | 11 ++++
> >> 2 files changed, 68 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
> >> index 28f7046e1b1a..2022ed76a620 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> >> @@ -403,7 +403,7 @@ void panfrost_device_reset(struct panfrost_device *pfdev)
> >> panfrost_job_enable_interrupts(pfdev);
> >> }
> >>
> >> -static int panfrost_device_resume(struct device *dev)
> >> +static int panfrost_device_runtime_resume(struct device *dev)
> >> {
> >> struct panfrost_device *pfdev = dev_get_drvdata(dev);
> >>
> >> @@ -413,7 +413,7 @@ static int panfrost_device_resume(struct device *dev)
> >> return 0;
> >> }
> >>
> >> -static int panfrost_device_suspend(struct device *dev)
> >> +static int panfrost_device_runtime_suspend(struct device *dev)
> >> {
> >> struct panfrost_device *pfdev = dev_get_drvdata(dev);
> >>
> >> @@ -426,5 +426,58 @@ static int panfrost_device_suspend(struct device *dev)
> >> return 0;
> >> }
> >>
> >> -EXPORT_GPL_RUNTIME_DEV_PM_OPS(panfrost_pm_ops, panfrost_device_suspend,
> >> - panfrost_device_resume, NULL);
> >> +static int panfrost_device_resume(struct device *dev)
> >> +{
> >> + struct panfrost_device *pfdev = dev_get_drvdata(dev);
> >> + int ret;
> >> +
> >> + if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS)) {
> >> + ret = clk_enable(pfdev->clock);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + if (pfdev->bus_clock) {
> >> + ret = clk_enable(pfdev->bus_clock);
> >> + if (ret)
> >> + goto err_bus_clk;
> >> + }
> >> + }
> >> +
> >> + ret = pm_runtime_force_resume(dev);
> >> + if (ret)
> >> + goto err_resume;
> >> +
> >> + return 0;
> >> +
> >> +err_resume:
> >> + if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS) && pfdev->bus_clock)
> >> + clk_disable(pfdev->bus_clock);
> >> +err_bus_clk:
> >> + if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS))
> >> + clk_disable(pfdev->clock);
> >> + return ret;
> >> +}
> >> +
> >> +static int panfrost_device_suspend(struct device *dev)
> >> +{
> >> + struct panfrost_device *pfdev = dev_get_drvdata(dev);
> >> + int ret;
> >> +
> >> + ret = pm_runtime_force_suspend(dev);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + if (pfdev->comp->pm_features & BIT(GPU_PM_CLK_DIS)) {
> >> + clk_disable(pfdev->clock);
> >> +
> >> + if (pfdev->bus_clock)
> >> + clk_disable(pfdev->bus_clock);
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +EXPORT_GPL_DEV_PM_OPS(panfrost_pm_ops) = {
> >> + RUNTIME_PM_OPS(panfrost_device_runtime_suspend, panfrost_device_runtime_resume, NULL)
> >> + SYSTEM_SLEEP_PM_OPS(panfrost_device_suspend, panfrost_device_resume)
> >> +};
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> >> index 1ef38f60d5dc..d7f179eb8ea3 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> >> @@ -25,6 +25,14 @@ struct panfrost_perfcnt;
> >> #define NUM_JOB_SLOTS 3
> >> #define MAX_PM_DOMAINS 5
> >>
> >> +/**
> >> + * enum panfrost_gpu_pm - Supported kernel power management features
> >> + * @GPU_PM_CLK_DIS: Allow disabling clocks during system suspend
> >> + */
> >> +enum panfrost_gpu_pm {
> >> + GPU_PM_CLK_DIS,
> >> +};
> >> +
> >> struct panfrost_features {
> >> u16 id;
> >> u16 revision;
> >> @@ -75,6 +83,9 @@ struct panfrost_compatible {
> >>
> >> /* Vendor implementation quirks callback */
> >> void (*vendor_quirk)(struct panfrost_device *pfdev);
> >> +
> >> + /* Allowed PM features */
> >> + u8 pm_features;
> >> };
> >>
> >> struct panfrost_device {
> >> --
> >> 2.42.0
> >>
>