Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

From: Daniel Lezcano
Date: Tue Sep 03 2019 - 02:40:48 EST


On 03/09/2019 08:31, Ming Lei wrote:
> Hi Daniel,
>
> On Tue, Sep 03, 2019 at 07:59:39AM +0200, Daniel Lezcano wrote:
>>
>> Hi Ming Lei,
>>
>> On 03/09/2019 05:30, Ming Lei wrote:
>>
>> [ ... ]
>>
>>
>>>>> 2) irq/timing doesn't cover softirq
>>>>
>>>> That's solvable, right?
>>>
>>> Yeah, we can extend irq/timing, but ugly for irq/timing, since irq/timing
>>> focuses on hardirq predication, and softirq isn't involved in that
>>> purpose.
>>>
>>>>
>>>>> Daniel, could you take a look and see if irq flood detection can be
>>>>> implemented easily by irq/timing.c?
>>>>
>>>> I assume you can take a look as well, right?
>>>
>>> Yeah, I have looked at the code for a while, but I think that irq/timing
>>> could become complicated unnecessarily for covering irq flood detection,
>>> meantime it is much less efficient for detecting IRQ flood.
>>
>> In the series, there is nothing describing rigorously the problem (I can
>> only guess) and why the proposed solution solves it.
>>
>> What is your definition of an 'irq flood'? A high irq load? An irq
>> arriving while we are processing the previous one in the bottom halves?
>
> So far, it means that handling interrupt & softirq takes all utilization
> of one CPU, then processes can't be run on this CPU basically, usually
> sort of CPU lockup warning will be triggered.

It is a scheduler problem then ?

>> The patch 2/4 description says "however IO completion is only done on
>> one of these submission CPU cores". That describes the bottleneck and
>> then the patch says "Add IRQF_RESCUE_THREAD to create one interrupt
>> thread handler", what is the rational between the bottleneck (problem)
>> and the irqf_rescue_thread (solution)?
>
> The solution is to switch to handle this interrupt on the created rescue
> irq thread context when irq flood is detected, and 'this interrupt' means
> the interrupt requested with IRQF_RESCUE_THREAD.
>
>>
>> Is it really the solution to track the irq timings to detect a flood?
>
> The solution tracks the time taken on running do_IRQ() for each CPU.




--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog