Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

From: Ming Lei
Date: Tue Sep 03 2019 - 02:31:49 EST


Hi Daniel,

On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote:
>
> Hi Ming Lei,
>
> On 03/09/2019 05:30, Ming Lei wrote:
>
> [ ... ]
>
>
> >>> 2) irq/timing doesn't cover softirq
> >>
> >> That's solvable, right?
> >
> > Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing
> > focuses on hardirq predication, and softirq isn't involved in that
> > purpose.
> >
> >>
> >>> Daniel, could you take a look and see if irq flood detection can be
> >>> implemented easily by irq/timing.c?
> >>
> >> I assume you can take a look as well, right?
> >
> > Yeah, I have looked at the code for a while, but I think that irq/timing
> > could become complicated unnecessarily for covering irq flood detection,
> > meantime it is much less efficient for detecting IRQ flood.
>
> In the series, there is nothing describing rigorously the problem (I can
> only guess) and why the proposed solution solves it.
>
> What is your definition of an 'irq flood'? A high irq load? An irq
> arriving while we are processing the previous one in the bottom halves?

So far, it means that handling interrupt & softirq takes all utilization
of one CPU, then processes can't be run on this CPU basically, usually
sort of CPU lockup warning will be triggered.

>
> The patch 2/4 description says "however IO completion is only done on
> one of these submission CPU cores". That describes the bottleneck and
> then the patch says "Add IRQF_RESCUE_THREAD to create one interrupt
> thread handler", what is the rational between the bottleneck (problem)
> and the irqf_rescue_thread (solution)?

The solution is to switch to handle this interrupt on the created rescue
irq thread context when irq flood is detected, and 'this interrupt' means
the interrupt requested with IRQF_RESCUE_THREAD.

>
> Is it really the solution to track the irq timings to detect a flood?

The solution tracks the time taken on running do_IRQ() for each CPU.


Thanks,
Ming