Re: [PATCH] arm64: dts: qcom: sdm845: Fix wild reboot during Antutu test

From: Luca Weiss
Date: Tue Jan 16 2024 - 09:03:45 EST


On Tue Jan 16, 2024 at 1:51 PM CET, Daniel Lezcano wrote:
> On 16/01/2024 13:37, Luca Weiss wrote:
> > On Tue Jan 16, 2024 at 12:59 PM CET, Daniel Lezcano wrote:
> >> Running an Antutu benchmark makes the board to do a hard reboot.
> >>
> >> Cause: it appears the gpu-bottom and gpu-top temperature sensors are showing
> >> too high temperatures, above 115°C.
> >>
> >> Out of tree configuratons show the gpu thermal zone is configured to
> >> be mitigated at 85°C with devfreq.
> >>
> >> Add the DT snippet to enable the thermal mitigation on the sdm845
> >> based board.
> >>
> >> Fixes: c79800103eb18 ("arm64: dts: sdm845: Add gpu and gmu device nodes")
> >> Cc: Amit Pundir <amit.pundir@xxxxxxxxxx>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> >
> > A part of this is already included with this patch:
> > https://lore.kernel.org/linux-arm-msm/20240102-topic-gpu_cooling-v1-4-fda30c57e353@xxxxxxxxxx/
> >
> > Maybe rebase on top of that one and add the 85degC trip point or
> > something?
>
> Actually, I think the patch is wrong.

I recommend telling Konrad in that patch then, not me :)

>
> The cooling effect does not operate on 'hot' trip point type as it is
> considered as a critical trip point. The governor is not invoked, so no
> mitigation happen. The 'hot' trip point type results in sending a
> notification to userspace to give the last chance to do something before
> 'critical' is reached where the system is shut down.
>
> I suggest to revert it and pick the one I proposed.

It hasn't been applied yet so it can be fixed in v2 there.

Regards
Luca