Re: [RESEND RFC PATCH v1 1/2] irq/spurious: Reset irqs_unhandled if an irq_thread handles one IRQ request

From: Leonardo Bras
Date: Wed Jan 17 2024 - 17:46:58 EST


On Wed, Jan 17, 2024 at 11:08:44PM +0100, Thomas Gleixner wrote:
> On Tue, Jan 16 2024 at 04:36, Leonardo Bras wrote:
> > This IRQ line disable bug can be easily reproduced with a serial8250
> > console on a PREEMPT_RT kernel: it only takes the user to print a lot
> > of text to the console (or to ttyS0): around 300k chars should be
> > enough.
>
> That has nothing to do with RT, it's a problem of force threaded
> interrupts in combination with an edge type interrupt line and a
> hardware which keeps firing interrupts forever.

Hello Thomas, thanks for your feedback!

I agreed it has nothing to do with RT.
I just mentioned PREEMPT_RT as my test case scenario, since it enables
force-threaded IRQs.

>
> > To fix this bug, reset irqs_unhandled whenever irq_thread handles at least
> > one IRQ request.
>
> This papers over the symptom and makes runaway detection way weaker for
> all interrupts or breaks it completely.

This change is supposed to only touch threaded interruptions, since it will
reach the included line only if (action_ret == IRQ_WAKE_THREAD) and if
desc->threads_handled changes since the last IRQ request.

This incrementing also happens only on irq_forced_thread_fn() and
irq_thread_fn(), which are called only from irq_thread_fn().

But I get the overall worry about having this making runaway detection way
weaker for all threaded interrupts.

I have previously worked on a solution that can be more precise and be an
opt-in for drivers instead of a general solution:

It required a change in IRQ interface that let the handlers inform how
many IRQs were actually handled (batching). This number would then be
added to desc->threads_handle (in irq_*thread_fn(), just changing the
atomic_inc() to atomic_add()), and then subtracted from irqs_unhandled
at note_interrupt().

In the serial8250 case, the driver would be changed to use that interface,
since it's already able to process multiple IRQs, and the bug just
vanishes.

This also solved the serial driver issue, but required a deeper change in
the code, which caused me to consider a simpler solution first.

This solution sure does give better runnaway detection. Do you think it
would be better that the one I sent in this patch?

>
> The problem with edge type interrupts is that we cannot mask them like
> we do with level type interrupts in the hard interrupt handler and
> unmask them once the threaded handler finishes.
>
> So yes, we need special rules here when:
>
> 1) The interrupt handler is force threaded
>
> 2) The interrupt line is edge type
>
> 3) The accumulated unhandled interrupts are within a sane margin
>
> Thanks,
>
> tglx
>

Completelly agree, that's why I am suggesting dealing with threaded
interruptions in a different way: reseting the unhandled count when it
handles a request.

I am not sure how force threaded and just threaded are different in this
scenario. Could you help me understand?

Thanks!
Leo