Re: Query on handling some special Group0 interrupt in Linux

From: Mukesh Ojha
Date: Wed Nov 09 2022 - 14:58:04 EST


Hi Marc,

Thanks for your reply.

On 11/9/2022 11:50 PM, Marc Zyngier wrote:
On Wed, 09 Nov 2022 16:20:35 +0000,
Mukesh Ojha <quic_mojha@xxxxxxxxxxx> wrote:

Hi,

I was working on a use case where both el2/el3 are implemented and we
have a watchdog interrupt (SPI), which is used for detecting software
hangs and cause device reset; If that interrupt's current cpu affinity
is on a core, where interrupts are disabled, we won't be able to serve
it or if this interrupt comes on a core which has interrupt enabled,
calling panic() or with smp_send_stop(), we would not be able
to know the call stack of the other cores which is running with
interrupt disabled.

I was thinking of configuring both a watchdog irq(SPI) and IPI_STOP
(SGI) or any reserve IPI as an FIQ. And from the watchdog irq handler,
I was thinking of calling panic() which eventually sends IPI_STOP(SGI
FIQ) to all the cores. And with this we will able to dump all the core
call stack.

I am able to achieve this but wanted to know if this is acceptable to
the community to support/allow such use cases like above and enable
group0 interrupt from GIC for some special use cases.

For a start, we only deal with Group-1 interrupts in Linux. Group-0
interrupts are for the firmware, and we really don't want to see them
(this is consistent with your HW having EL3).

What is the downside of it we support this ? I see one of the implementation here.

https://elixir.bootlin.com/linux/v6.0.7/source/drivers/irqchip/irq-apple-aic.c#L510

We also mask IRQ and FIQ at the same time, so this is a non-starter.
This can be taken care if we support this.


If you want to be able to deliver an interrupt while the interrupts
are masked, what you are looking for is the NMI framework, for which
you can register SPIs as (pseudo-)NMI.

Yes, kind of NMI.
I have already looked into this.
Since, in our system El2 is implemented and each physical interrupt get routed to hypervisor and later vIrq comes to El1 and each interrupt enable/disable call exercise pmr register trap can cause latency in
regular run(like multiple VM).

Since, some of the use-case could be special like i have mentioned
in my initial mail where such interrupt will be fatal and system will
get reset after that. I am not able to think of any other use case than
this but can this not be considered as one of the feature.


This is of course assuming that you're using GICv3. If you're using an
older version of the architecture, we don't have a good solution for
you, unfortunately.


we are using GICv3.

Thanks,

M.


-Mukesh