Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

From: AngeloGioacchino Del Regno
Date: Tue Nov 28 2023 - 11:10:28 EST


Il 28/11/23 16:53, Boris Brezillon ha scritto:
On Tue, 28 Nov 2023 16:10:45 +0100
AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
wrote:

static void panfrost_job_handle_err(struct panfrost_device *pfdev,
struct panfrost_job *job,
unsigned int js)
@@ -792,9 +800,13 @@ static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data)
struct panfrost_device *pfdev = data;
panfrost_job_handle_irqs(pfdev);
- job_write(pfdev, JOB_INT_MASK,
- GENMASK(16 + NUM_JOB_SLOTS - 1, 16) |
- GENMASK(NUM_JOB_SLOTS - 1, 0));
+
+ /* Enable interrupts only if we're not about to get suspended */
+ if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))

The irq-line is requested with IRQF_SHARED, meaning the line might be
shared between all three GPU IRQs, but also with other devices. I think
if we want to be totally safe, we need to also check this is_suspending
field in the hard irq handlers before accessing the xxx_INT_yyy
registers.

This would mean that we would have to force canceling jobs in the suspend
handler, but if the IRQ never fired, would we still be able to find the
right bits flipped in JOB_INT_RAWSTAT?

There should be no jobs left if we enter suspend. If there is, that's a
bug we should fix, but I'm digressing.


From what I understand, are you suggesting to call, in job_suspend_irq()
something like

void panfrost_job_suspend_irq(struct panfrost_device *pfdev)
{
u32 status;

set_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending);

job_write(pfdev, JOB_INT_MASK, 0);
synchronize_irq(pfdev->js->irq);

status = job_read(pfdev, JOB_INT_STAT);

I guess you meant _RAWSTAT. _STAT should always be zero after we've
written 0 to _INT_MASK.


Whoops! Yes, as I wrote up there, I meant _RAWSTAT, sorry! :-)

if (status)
panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev);

Nope, we don't need to read the STAT reg and forcibly call the threaded
handler if it's != 0. The synchronize_irq() call should do exactly that
(make sure all pending interrupts are processed before returning), and
our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new
interrupts will kick in after that point.


Unless we synchronize_irq() *before* masking all interrupts (which would be
wrong, as some interrupt could still fire after execution of the ISR), we get
*either of* two scenarios:

- COMP_BIT_JOB is not set, softirq thread unmasks some interrupts by
writing to JOB_INT_MASK; or
- COMP_BIT_JOB is set, hardirq handler returns IRQ_NONE, the threaded
interrupt handler doesn't get executed, jobs are not canceled.

So if we don't forbicly call the threaded handler if RAWSTAT != 0 in there,
and if the extra check is present in the hardirq handler, and if the hardirq
handler wasn't executed already before our synchronize_irq() call (so: if the
hardirq execution has to be done to synchronize irqs), we are not guaranteeing
that jobs cancellation/dequeuing/removal/whatever-handling is done before
entering suspend.

That, unless the suggestion was to call panfrost_job_handle_irqs() instead of
the handler thread like that (because reading it back, it makes sense to do so).

Cheers!

}

and then while still retaining the check in the IRQ thread handler, also
check it in the hardirq handler like

static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
{
struct panfrost_device *pfdev = data;
u32 status;

if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))
return IRQ_NONE;

Yes, that's the extra check I was talking about, and that's also the
very reason I'm suggesting to call this field suspended_irqs instead of
is_suspending. Ultimately, each bit in this bitmap encodes the status
of a specific IRQ, not the transition from active-to-suspended,
otherwise we'd be clearing the bit at the end of
panfrost_job_suspend_irq(), right after the synchronize_irq(). But if
we were doing that, our hard IRQ handler could be called because other
devices raised an interrupt on the very same IRQ line while we are
suspended, and we'd be doing an invalid GPU reg read while the
clks/power-domains are off.


status = job_read(pfdev, JOB_INT_STAT);
if (!status)
return IRQ_NONE;

job_write(pfdev, JOB_INT_MASK, 0);
return IRQ_WAKE_THREAD;
}

(rinse and repeat for panfrost_mmu)

..or am I misunderstanding you?

Cheers,
Angelo