Re: [PATCH v2 1/6] drm/panfrost: Perform hard reset to recover GPU if soft reset fails

From: Steven Price
Date: Wed Nov 08 2023 - 10:44:49 EST


On 02/11/2023 14:26, AngeloGioacchino Del Regno wrote:
> Even though soft reset should ideally never fail, during development of
> some power management features I managed to get some bits wrong: this
> resulted in GPU soft reset failures, where the GPU was never able to
> recover, not even after suspend/resume cycles, meaning that the only
> way to get functionality back was to reboot the machine.
>
> Perform a hard reset after a soft reset failure to be able to recover
> the GPU during runtime (so, without any machine reboot).
>
> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
> ---
> drivers/gpu/drm/panfrost/panfrost_gpu.c | 14 ++++++++++----
> drivers/gpu/drm/panfrost/panfrost_regs.h | 1 +
> 2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index fad75b6e543e..7e9e2cf26e4d 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -60,14 +60,20 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
>
> gpu_write(pfdev, GPU_INT_MASK, 0);
> gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_RESET_COMPLETED);
> - gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
>
> + gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
> ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT,
> val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000);
> -

I'm not sure what's going on with blank lines above - AFAICT there's no
actual change just a blank line being moved. It's best to avoid blank
line changes to keep the diff readable.

> if (ret) {
> - dev_err(pfdev->dev, "gpu soft reset timed out\n");
> - return ret;
> + dev_err(pfdev->dev, "gpu soft reset timed out, attempting hard reset\n");
> +
> + gpu_write(pfdev, GPU_CMD, GPU_CMD_HARD_RESET);
> + ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT,
> + val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000);

NIT: checkpatch complains about the alignment here.

Other than the minor comments this looks fine. Hard reset isn't
something we want to use (there's a possibility of locking up the system
if it occurs during a bus transaction) but it can sometimes recover an
otherwise completely locked up GPU.

Steve

> + if (ret) {
> + dev_err(pfdev->dev, "gpu hard reset timed out\n");
> + return ret;
> + }
> }
>
> gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h
> index 55ec807550b3..c25743b05c55 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
> @@ -44,6 +44,7 @@
> GPU_IRQ_MULTIPLE_FAULT)
> #define GPU_CMD 0x30
> #define GPU_CMD_SOFT_RESET 0x01
> +#define GPU_CMD_HARD_RESET 0x02
> #define GPU_CMD_PERFCNT_CLEAR 0x03
> #define GPU_CMD_PERFCNT_SAMPLE 0x04
> #define GPU_CMD_CYCLE_COUNT_START 0x05