Re: [PATCH] drm/msm: Disable frequency clamping on a630

From: Caleb Connolly
Date: Sat Aug 07 2021 - 15:21:56 EST


Hi Rob, Akhil,

On 29/07/2021 21:53, Rob Clark wrote:
On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
<caleb.connolly@xxxxxxxxxx> wrote:



On 29/07/2021 21:24, Rob Clark wrote:
On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
<caleb.connolly@xxxxxxxxxx> wrote:

Hi Rob,

I've done some more testing! It looks like before that patch ("drm/msm: Devfreq tuning") the GPU would never get above
the second frequency in the OPP table (342MHz) (at least, not in glxgears). With the patch applied it would more
aggressively jump up to the max frequency which seems to be unstable at the default regulator voltages.

*ohh*, yeah, ok, that would explain it

Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 0.988v (instead of the stock 0.516v) makes the GPU stable
at the higher frequencies.

Applying this patch reverts the behaviour, and the GPU never goes above 342MHz in glxgears, losing ~30% performance in
glxgear.

I think (?) that enabling CPR support would be the proper solution to this - that would ensure that the regulators run
at the voltage the hardware needs to be stable.

Is hacking the voltage higher (although ideally not quite that high) an acceptable short term solution until we have
CPR? Or would it be safer to just not make use of the higher frequencies on a630 for now?


tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already
on CC and I added sboyd, maybe one of them knows better.

In the short term, removing the higher problematic OPPs from dts might
be a better option than this patch (which I'm dropping), since there
is nothing stopping other workloads from hitting higher OPPs.
Oh yeah that sounds like a more sensible workaround than mine .

I'm slightly curious why I didn't have problems at higher OPPs on my
c630 laptop (sdm850)
Perhaps you won the sillicon lottery - iirc sdm850 is binned for higher clocks as is out of the factory.

Would it be best to drop the OPPs for all devices? Or just those affected? I guess it's possible another c630 might
crash where yours doesn't?

I've not heard any reports of similar issues from the handful of other
folks with c630's on #aarch64-laptops.. but I can't really say if that
is luck or not.
It looks like this affects at least the OnePlus 6 and PocoPhone F1, I've done some more poking and the following diff seems to fix the stability issues completely, it seems the delay is required to let the update propagate.

This doesn't feel like the right fix, but hopefully it's enough to come up with a better solution than disabling the new devfreq behaviour on a630.

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index d7cec7f0dde0..69e2a5e84dae 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
return;
}

+ dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
+
+ usleep_range(300, 500);
+
gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0);

gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING,
@@ -158,7 +162,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
if (ret)
dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);

- dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
pm_runtime_put(gmu->dev);
}

Maybe just remove it for affected devices? But I'll defer to Bjorn.

BR,
-R


--
Kind Regards,
Caleb (they/them)