Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely

From: Jie Zhan
Date: Tue Aug 29 2023 - 05:06:21 EST




On 26/08/2023 02:00, Thomas Gleixner wrote:
On Thu, Jul 20 2023 at 20:24, Jie Zhan wrote:
Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
interrupt that has no mapping"), we have found failures when
re-inserting some specific drivers:

[root@localhost ~]# rmmod hisi_sas_v3_hw
[root@localhost ~]# modprobe hisi_sas_v3_hw
[ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2

This comes from the case where some IRQs allocated from a low-level domain,
e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
driver insertion fails to get the same number of IRQs because some IRQs are
still occupied.
Why?

Free a contiguous group of IRQs in one go to fix this issue.
Again why?

@@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
unsigned int nr_irqs)
{
unsigned int i;
+ int n;
if (!domain->ops->free)
return;
for (i = 0; i < nr_irqs; i++) {
- if (irq_domain_get_irq_data(domain, irq_base + i))
- domain->ops->free(domain, irq_base + i, 1);
+ /* Find the largest possible span of IRQs to free in one go */
+ for (n = 0;
+ ((i + n) < nr_irqs) &&
+ (irq_domain_get_irq_data(domain, irq_base + i + n));
+ n++)
+ ;
For one this is unreadable gunk. But what's worse it still does not
explain what this is solving.

It's completely sensible to expect that freeing interrupts in a range
one by one just works.

So why do we need to work around an obvious low level failure in the
core code?

Thanks,

tglx

Hi Thomas,

Many thanks for taking a look.

I believe this patch should be completely reworked as it has caused many questions
in the first place --- it's not explaining itself well. Please ignore this one now.

The story of the problem is a bit long and complicated. The previous disscusion can
be found in the link attached.

Jie