Re: [PATCH] irqchip: omap-intc: fix spurious irq handling

From: John Ogness
Date: Tue Oct 20 2015 - 03:33:08 EST


On 2015-10-20, Sekhar Nori <nsekhar@xxxxxx> wrote:
>> Do you know what really is causing the spurious interrupts in your
>> case?
>
> No, not yet.

According to the TRM this is normal behavior if conditions that might
affect priority are changed during priority sorting.

6.2.5 ARM A8 INTC Spurious Interrupt Handling

The spurious flag indicates whether the result of the sorting (a
window of 10 INTC functional clock cycles after the interrupt
assertion) is invalid. The sorting is invalid if:

- The interrupt that triggered the sorting is no longer active
during the sorting.

- A change in the mask has affected the result during the sorting
time.

>> In all the cases I've seen, the spurious interrupts were caused by a
>> missing flush of posted write acking the IRQ at the device driver.
>> for the _previously triggered_ INTC interrupt.
>>
>> If you have a reproducable case, I suggest you test that by printing
>> out the previous interrupt to check if that makes sense. And then see
>> if adding the missing read back to that interrupt handler fixes the
>> issue.
>
> Okay, thats good to know. Thanks for the hints and history of your debug
> on OMAP3. The issue is not easily reproducible in my case. But if I try
> hard enough, I can get hit it though. So I can surely try your hints.

I can reproduce the situation very easily. After running a test for a
few minutes and printing out the previous interrupt, I have the
following list. These are the irq numbers seen by the handler before the
spurious interrupt triggered.

INT12 - EDMACOMPINT - TPCC (EDMA)
INT41 - 3PGSWRXINT0 - CPSW (Ethernet)
INT42 - 3PGSWTXINT0 - CPSW (Ethernet)
INT68 - TINT2 - DMTIMER2
INT72 - UART0INT - UART0

>From this I do not think we can put the blame on any single driver. I
trigger this situation very easily by putting a load of 7,000+
interrupts per second on the system. This means we have 70,000 INTC
clock cycles per second where a change in the interrupt priority
conditions would cause the priority sorting to become invalid and thus
cause the spurious interrupt.

I'm not sure if we can/should do anything more than Sekhar's patch of
acknowledging the spurious interrupt so the priority sorting algorithm
can run again.

John Ogness
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/