Potential issue with (or misunderstanding of) of_irq_get()

From: Conor Dooley
Date: Fri May 19 2023 - 07:04:10 EST


Hey!

I've run into an issue with of_irq_get() while writing an irqchip driver
and I was hoping that by posting about it I might get some guidance as
to whether I just doing something fundamentally wrong in my code, or
if the specific case was just an oversight.

I've been trying to solve the issue that I pointed out here:
https://lore.kernel.org/linux-gpio/23a69be6-96d3-1c28-f1aa-555e38ff991e@xxxxxxxxxxxxx/

To spare reading that, the TL;DR is that the SoC has 3 GPIO controllers,
with 14, 24 and 32 GPIOs each. All 68 can be used for interrupts.
The PLIC only has 41 interrupts for GPIOs, so there's a bit of extra RTL
sitting between the GPIO controllers and the PLIC, that is runtime
configurable, deciding whether an GPIO gets a PLIC interrupt of its
own or shares an interrupt with other GPIOs from the same GPIO controller.

Since the interrupt router/mux is not part of the GPIO controller blocks,
I have written a driver for the it & changed the representation in the DT
to the below. For each of the 41 interrupts "consumed" by the driver
bound to the irqmux node, I have created a domain.

irqmux: interrupt-controller@20002054 {
interrupt-parent = <&plic>;
interrupts = <13>, <14>, <15>, <16>,
<17>, <18>, <19>, <20>,
<21>, <22>, <23>, <24>,
<25>, <26>, <27>, <28>,
<29>, <30>, <31>, <32>,
<33>, <34>, <35>, <36>,
<37>, <38>, <39>, <40>,
<41>, <42>, <43>, <44>,
<45>, <46>, <47>, <48>,
<49>, <50>, <51>, <52>,
<53>;
};

gpio0: gpio@20120000 {
interrupt-parent = <&irqmux>;
npios = <14>;
interrupts = <0>, <1>, <2>, <3>,
<4>, <5>, <6>, <7>,
<8>, <9>, <10>, <11>,
<12>, <13>;
};

gpio1: gpio@20121000 {
interrupt-parent = <&irqmux>;
npios = <24>;
interrupts = <32>, <33>, <34>, <35>,
<36>, <37>, <38>, <39>,
<40>, <41>, <42>, <43>,
<44>, <45>, <46>, <47>,
<48>, <49>, <50>, <51>,
<52>, <53>, <54>, <55>;
};

gpio2: gpio@20122000 {
interrupt-parent = <&irqmux>;
npios = <32>;
interrupts = <64>, <65>, <66>, <67>,
<68>, <69>, <70>, <71>,
<72>, <73>, <74>, <75>,
<76>, <77>, <78>, <79>,
<80>, <81>, <82>, <83>,
<84>, <85>, <86>, <87>,
<88>, <89>, <90>, <91>,
<92>, <93>, <94>, <95>;
};

This approach in DT allows the GPIO controller driver to not care about
the router/mux configuration, which makes sense to me as it is not part
of those IP blocks.

My irqchip driver was adding domains like so:

for (; i < MPFS_MUX_NUM_IRQS; i++) {
priv->irqchip_data[i].output_hwirq = i;

priv->irqchip_data[i].irq = irq_of_parse_and_map(node, i);

domain = irq_domain_add_linear(node, MPFS_MAX_IRQS_PER_GPIO,
&mpfs_irq_mux_nondirect_domain_ops,
&priv->irqchip_data[i]);

irq_set_chained_handler_and_data(priv->irqchip_data[i].irq,
mpfs_irq_mux_nondirect_handler,
&priv->irqchip_data[i]);
}

In my irqchip's select callback I check the struct irq_fwspec's param[0]
to determine which domain is actually responsible for it.

That's all working nicely & I was doing some cleanup before submitting,
when I noticed that debugfs complained about the fact that I had several
domains hanging off the same of device_node:
debugfs: File ':soc:interrupt-controller@20002054' in directory 'domains' already present!
debugfs: File ':soc:interrupt-controller@20002054' in directory 'domains' already present!

To get around that, I tried to switch to creating fwnodes instead,
one for each domain:

for (; i < MPFS_MUX_NUM_IRQS; i++) {
priv->irqchip_data[i].output_hwirq = i;

priv->irqchip_data[i].irq = irq_of_parse_and_map(node, i);

fwnode = irq_domain_alloc_named_id_fwnode("mpfs-irq-mux", i);

domain = irq_domain_create_linear(fwnode, MPFS_MAX_IRQS_PER_GPIO,
&mpfs_irq_mux_nondirect_domain_ops,
&priv->irqchip_data[i]);

irq_set_chained_handler_and_data(priv->irqchip_data[i].irq,
mpfs_irq_mux_nondirect_handler,
&priv->irqchip_data[i]);
}

That's grand for debugfs, but I then ran into a problem that made me feel
I had designed myself into an incorrect corner.

The GPIO driver requests interrupts using:

nirqs = of_irq_count(node);
for (i = 0; i < nirqs; i++)
ret = platform_get_irq(pdev, i);

producing the following call trace:

mpfs_gpio_probe()
platform_get_irq()
platform_get_irq_optional()
of_irq_get()
irq_find_host()
irq_find_matching_host()
irq_find_matching_fwnode()
irq_find_matching_fwspec()

of_irq_find_host() is problematic for me, because it calls into
irq_find_matching_fwspec() having created a struct irq_fwspec where it
only populated the fwnode.

In irq_find_matching_fwspec(), the check for fwspec->param_count will
always be false & my select callback is never invoked.

Since irq_find_matching_fwspec() falls back to the match callback if
param_count is zero, I have added a match callback:

static int mpfs_irq_mux_nondirect_match(struct irq_domain *d, struct device_node *node,
enum irq_domain_bus_token bus_token)
{
if(!of_device_is_compatible(node, "microchip,mpfs-gpio-irq-mux"))
return false;

return true;
}

But this is a hack, since I have 41 domains using the same match callback,
and will return true as long as just one of them has been created.
It is sufficient though to get me as far along irq_create_of_mapping()
which is called using a struct of_phandle_args, containing the params
array that I rely on.

I appear to have run into a fundamental assumption in of_irq_get(), which
makes me think that I have gone pretty badly wrong here somewhere.
Is this a bug/oversight/omission, or am I just doing something broken in
my code?
Given how long that code seems to be there, I am leading towards the fault
being mine (it usually is!).

The code for this (likely aberration of an) irqchip driver is here [1]
but hopefully the snippets/description is sufficient.

Thanks,
Conor.

1 - https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/commit/?h=gpio-irq&id=d1d654bd3de925a6dc48d6e2222adb3d310b8f9d

Attachment: signature.asc
Description: PGP signature