Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

From: Boris Brezillon
Date: Tue Nov 28 2023 - 10:38:19 EST


On Tue, 28 Nov 2023 16:10:43 +0100
AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
wrote:

> Il 28/11/23 15:06, Boris Brezillon ha scritto:
> > On Tue, 28 Nov 2023 13:45:10 +0100
> > AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
> > wrote:
> >
> >> To make sure that we don't unintentionally perform any unclocked and/or
> >> unpowered R/W operation on GPU registers, before turning off clocks and
> >> regulators we must make sure that no GPU, JOB or MMU ISR execution is
> >> pending: doing that required to add a mechanism to synchronize the
> >> interrupts on suspend.
> >>
> >> Add functions panfrost_{gpu,job,mmu}_suspend_irq() which will perform
> >> interrupts masking and ISR execution synchronization, and then call
> >> those in the panfrost_device_runtime_suspend() handler in the exact
> >> sequence of job (may require mmu!) -> mmu -> gpu.
> >>
> >> As a side note, JOB and MMU suspend_irq functions needed some special
> >> treatment: as their interrupt handlers will unmask interrupts, it was
> >> necessary to add a bitmap for "is_suspending" which is used to address
> >> the possible corner case of unintentional IRQ unmasking because of ISR
> >> execution after a call to synchronize_irq().
> >>
> >> Of course, unmasking the interrupts is being done as part of the reset
> >> happening during runtime_resume(): since we're anyway resuming all of
> >> GPU, JOB, MMU, the only additional action is to zero out the newly
> >> introduced `is_suspending` bitmap directly in the resume handler, as
> >> to avoid adding panfrost_{job,mmu}_resume_irq() function just for
> >> clearing own bits, especially because it currently makes way more sense
> >> to just zero out the bitmap.
> >>
> >> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
> >> ---
> >> drivers/gpu/drm/panfrost/panfrost_device.c | 4 ++++
> >> drivers/gpu/drm/panfrost/panfrost_device.h | 7 +++++++
> >> drivers/gpu/drm/panfrost/panfrost_gpu.c | 7 +++++++
> >> drivers/gpu/drm/panfrost/panfrost_gpu.h | 1 +
> >> drivers/gpu/drm/panfrost/panfrost_job.c | 18 +++++++++++++++---
> >> drivers/gpu/drm/panfrost/panfrost_job.h | 1 +
> >> drivers/gpu/drm/panfrost/panfrost_mmu.c | 17 ++++++++++++++---
> >> drivers/gpu/drm/panfrost/panfrost_mmu.h | 1 +
> >> 8 files changed, 50 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
> >> index c90ad5ee34e7..ed34aa55a7da 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_device.c
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_device.c
> >> @@ -407,6 +407,7 @@ static int panfrost_device_runtime_resume(struct device *dev)
> >> {
> >> struct panfrost_device *pfdev = dev_get_drvdata(dev);
> >>
> >> + bitmap_zero(pfdev->is_suspending, PANFROST_COMP_BIT_MAX);
> >
> > I would let each sub-block clear their bit in the reset path, since
> > that's where the IRQs are effectively unmasked.
> >
>
>
> Honestly I wouldn't like seeing that: the reason is that this is something that
> is done *for* suspend/resume and only for that, while reset may be called out of
> the suspend/resume handlers.
>
> I find clearing the suspend bits in the HW reset path a bit confusing, especially
> when it is possible to avoid doing it there...

Well, I do think it's preferable to keep the irq_is_no_longer_suspended
state update where the interrupt is effectively unmasked. Note that
when you do a reset, the IRQ is silently suspended just after the
reset happens, because the xxx_INT_MASKs are restored to their default
value, so I do consider that clearing this bit in the reset path makes
sense.

>
> >> panfrost_device_reset(pfdev);
> >> panfrost_devfreq_resume(pfdev);
> >>
> >> @@ -421,6 +422,9 @@ static int panfrost_device_runtime_suspend(struct device *dev)
> >> return -EBUSY;
> >>
> >> panfrost_devfreq_suspend(pfdev);
> >> + panfrost_job_suspend_irq(pfdev);
> >> + panfrost_mmu_suspend_irq(pfdev);
> >> + panfrost_gpu_suspend_irq(pfdev);
> >> panfrost_gpu_power_off(pfdev);
> >>
> >> return 0;
> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
> >> index 54a8aad54259..29f89f2d3679 100644
> >> --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> >> +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> >> @@ -25,6 +25,12 @@ struct panfrost_perfcnt;
> >> #define NUM_JOB_SLOTS 3
> >> #define MAX_PM_DOMAINS 5
> >>
> >> +enum panfrost_drv_comp_bits {
> >> + PANFROST_COMP_BIT_MMU,
> >> + PANFROST_COMP_BIT_JOB,
> >> + PANFROST_COMP_BIT_MAX
> >> +};
> >> +
> >> /**
> >> * enum panfrost_gpu_pm - Supported kernel power management features
> >> * @GPU_PM_CLK_DIS: Allow disabling clocks during system suspend
> >> @@ -109,6 +115,7 @@ struct panfrost_device {
> >>
> >> struct panfrost_features features;
> >> const struct panfrost_compatible *comp;
> >> + DECLARE_BITMAP(is_suspending, PANFROST_COMP_BIT_MAX);
> >
> > nit: Maybe s/is_suspending/suspended_irqs/, given the state remains
> > until the device is resumed.
>
> If we keep the `is_suspending` name, we can use this one more generically in
> case we ever need to, what do you think?

I'm lost. Why would we want to reserve a name for something we don't
know about? My comment was mostly relating to the fact this bitmap
doesn't reflect the is_suspending state, but rather is_suspended,
because it remains set until the device is resumed. And we actually want
it to reflect the is_suspended state, so we can catch interrupts that
are not for us without reading regs in the hard irq handler, when the
GPU is suspended.