Re: [PATCH v2] drm/panfrost: Ignore core_mask for poweroff and sync interrupts

From: AngeloGioacchino Del Regno
Date: Thu Nov 23 2023 - 06:58:30 EST


Il 23/11/23 12:57, Krzysztof Kozlowski ha scritto:
On 23/11/2023 12:50, AngeloGioacchino Del Regno wrote:
Some SoCs may be equipped with a GPU containing two core groups
and this is exactly the case of Samsung's Exynos 5422 featuring
an ARM Mali-T628 MP6 GPU: the support for this GPU in Panfrost
is partial, as this driver currently supports using only one
core group and that's reflected on all parts of it, including
the power on (and power off, previously to this patch) function.

The issue with this is that even though executing the soft reset
operation should power off all cores unconditionally, on at least
one platform we're seeing a crash that seems to be happening due
to an interrupt firing which may be because we are calling power
transition only on the first core group, leaving the second one
unchanged, or because ISR execution was pending before entering
the panfrost_gpu_power_off() function and executed after powering
off the GPU cores, or all of the above.

Finally, solve this by introducing a new panfrost_gpu_suspend_irq()
helper function and changing the panfrost_device_suspend() flow to
1. Mask and clear all interrupts: we don't need nor want any, as
for power_off() we are polling PWRTRANS, but we anyway don't
want GPU IRQs to fire while it is suspended/powered off;
2. Call synchronize_irq() after that to make sure that any pending
ISR is executed before powering off the GPU Shaders/Tilers/L2
hence avoiding unpowered registers R/W; and
3. Ignore the core_mask and ask the GPU to poweroff both core groups

Of course it was also necessary to add a `irq` variable to `struct
panfrost_device` as we need to get that in panfrost_gpu_power_off()
for calling synchronize_irq() on it.

Fixes: 22aa1a209018 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()")
[Regression detected on Odroid HC1, Exynos5422, Mali-T628 MP6]
Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
---

Changes in v2:
- Fixed the commit hash of "Really power off [...]"
- Actually based on a clean next-20231121
- Renamed "irq" to "gpu_irq" as per Boris' suggestion
- Moved the IRQ mask/clear/sync to a helper function and added
a call to that in panfrost_device.c instead of doing that in
panfrost_gpu_power_off().

NOTE: I didn't split 1+2 from 3 as suggested by Boris, and I'm sending
this one without waiting for feedback on my reasons for that which I
explained as a reply to v1 because the former couldn't be applied to
linux-next, and I want to unblock Krzysztof ASAP to get this tested.


This does not compile.


I really have to take a break. My brain starts failing, as I can see.

Sorry.