Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()

From: Marc Zyngier
Date: Sat Dec 17 2022 - 05:47:00 EST


On Sat, 17 Dec 2022 00:45:50 +0000,
Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
>
> On 12/16/22 05:58, Marc Zyngier wrote:
> [ ... ]
>
> >> With both these fixes applied, it actually then leads to the very
> >> next WARN_ON failing in msi_ctrl_valid... Because ctrl->last ==
> >> hwsize. I think Thomas' initial fix for msi_domain_get_hwsize has
> >> an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
> >> msi_domain_get_hwsize instead.
> >
> > Yes, that's a good point, and that's consistent with what
> > __msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
> > when info->hwsize is 0. No reason to do something else here.
> >
> > I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
> > send it out.
> >
> With
>
> 7a27b6136dcb (local/testing, testing-msi) genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
> c581d525bb1d genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
> 9d33edb20f7e Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> I still get the following runtime warning.
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 8 at kernel/irq/msi.c:196 .msi_domain_free_descs+0x144/0x170
> Modules linked in:
> CPU: 0 PID: 8 Comm: kworker/u2:0 Tainted: G N 6.1.0-01957-g7a27b6136dcb #1
> Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
> Workqueue: nvme-reset-wq .nvme_reset_work
> NIP: c000000000107d54 LR: c000000000107d44 CTR: 0000000000000000
> REGS: c0000000041e74d0 TRAP: 0700 Tainted: G N (6.1.0-01957-g7a27b6136dcb)
> MSR: 0000000080029002 <CE,EE,ME> CR: 44002282 XER: 20000000
> IRQMASK: 0
> GPR00: c000000000107d44 c0000000041e7770 c000000001629c00 c000000004e748a0
> GPR04: 000000005358db0a c000000001ce7a00 c00000000423b5d0 000000004735aaa2
> GPR08: 0000000000000002 0000000000000013 c00000000423acc0 c00000000214a998
> GPR12: 0000000024002282 c000000002579000 c00000000008e190 c000000004173540
> GPR16: 0000000000000000 c0000000043810b8 0000000000000000 0000000000000001
> GPR20: c0000000060b22d8 c0000000060a70f0 0000000000000000 c000000001996800
> GPR24: c0000000017df6c0 c0000000043810b8 c0000000060b2388 c0000000060b2000
> GPR28: ffffffffffffffff c0000000041e7888 c000000006025ac8 c000000004e748a0
> NIP [c000000000107d54] .msi_domain_free_descs+0x144/0x170
> LR [c000000000107d44] .msi_domain_free_descs+0x134/0x170
> Call Trace:
> [c0000000041e7770] [c000000000107d44] .msi_domain_free_descs+0x134/0x170 (unreliable)
> [c0000000041e7810] [c0000000001085d8] .msi_domain_free_msi_descs_range+0x38/0x70
> [c0000000041e78a0] [c0000000008d000c] .pci_msi_teardown_msi_irqs+0x4c/0xa0
> [c0000000041e7920] [c0000000008cf9e8] .pci_free_msi_irqs+0x18/0x50
> [c0000000041e79a0] [c0000000008cd8d0] .pci_free_irq_vectors+0x80/0xb0
> [c0000000041e7a20] [c000000000a6d2a0] .nvme_reset_work+0x870/0x1780
> [c0000000041e7bb0] [c000000000080e68] .process_one_work+0x2d8/0x7b0
> [c0000000041e7c90] [c0000000000813d8] .worker_thread+0x98/0x4f0
> [c0000000041e7d70] [c00000000008e2cc] .kthread+0x13c/0x150
> [c0000000041e7e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
> Instruction dump:
> 7fc3f378 48ff1ca9 60000000 7c7f1b79 41c2002c e8810070 7fc3f378 48ff3491
> 60000000 813f0000 2c090000 41e2ffb0 <0fe00000> 60000000 60000000 ebc10090
> irq event stamp: 98168
> hardirqs last enabled at (98167): [<c00000000110a274>] ._raw_spin_unlock_irqrestore+0x84/0xd0
> hardirqs last disabled at (98168): [<c000000000010b58>] .program_check_exception+0x38/0x120
> softirqs last enabled at (97760): [<c00000000110b4dc>] .__do_softirq+0x60c/0x678
> softirqs last disabled at (97749): [<c000000000004d20>] .do_softirq_own_stack+0x30/0x50
> ---[ end trace 0000000000000000 ]---
> nvme nvme0: 1/0/0 default/read/poll queues
> nvme nvme0: Ignoring bogus Namespace Identifiers
> ...
>
> The system boots fine, though. This is seen when booting the ppce500
> machine with e5500 CPU and corenet64_smp_defconfig from nvme.

+PPC folks.

Thanks for the report.

I managed to reproduce this, although in a limited way (a SMP qemu
instance wouldn't boot at all). The problem is that the core MSI code
now assumes that if the arch code is in charge of breaking the
association of a MSI with a device, it is also in charge of clearing
the irq in the MSI descriptor.

This matches the s390 behaviour, but not powerpc's, hence the splat
and the leaked MSI descriptors. The minimal fix should be as follow,
which I'll add to the pile of patches.

Thanks,

M.

diff --git a/arch/powerpc/platforms/4xx/hsta_msi.c b/arch/powerpc/platforms/4xx/hsta_msi.c
index d4f7fff1fc87..e11b57a62b05 100644
--- a/arch/powerpc/platforms/4xx/hsta_msi.c
+++ b/arch/powerpc/platforms/4xx/hsta_msi.c
@@ -115,6 +115,7 @@ static void hsta_teardown_msi_irqs(struct pci_dev *dev)
msi_bitmap_free_hwirqs(&ppc4xx_hsta_msi.bmp, irq, 1);
pr_debug("%s: Teardown IRQ %u (index %u)\n", __func__,
entry->irq, irq);
+ entry->irq = 0;
}
}

diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c
index 5b012abca773..0c11aad896c7 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -289,6 +289,7 @@ static void axon_msi_teardown_msi_irqs(struct pci_dev *dev)
msi_for_each_desc(entry, &dev->dev, MSI_DESC_ASSOCIATED) {
irq_set_msi_desc(entry->irq, NULL);
irq_dispose_mapping(entry->irq);
+ entry->irq = 0;
}
}

diff --git a/arch/powerpc/platforms/pasemi/msi.c b/arch/powerpc/platforms/pasemi/msi.c
index dc1846660005..166c97fff16d 100644
--- a/arch/powerpc/platforms/pasemi/msi.c
+++ b/arch/powerpc/platforms/pasemi/msi.c
@@ -66,6 +66,7 @@ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
hwirq = virq_to_hw(entry->irq);
irq_set_msi_desc(entry->irq, NULL);
irq_dispose_mapping(entry->irq);
+ entry->irq = 0;
msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, ALLOC_CHUNK);
}
}
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 73c2d70706c0..57978a44d55b 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -132,6 +132,7 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
msi_data = irq_get_chip_data(entry->irq);
irq_set_msi_desc(entry->irq, NULL);
irq_dispose_mapping(entry->irq);
+ entry->irq = 0;
msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
}
}
diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
index 1d8cfdfdf115..492cb03c0b62 100644
--- a/arch/powerpc/sysdev/mpic_u3msi.c
+++ b/arch/powerpc/sysdev/mpic_u3msi.c
@@ -108,6 +108,7 @@ static void u3msi_teardown_msi_irqs(struct pci_dev *pdev)
hwirq = virq_to_hw(entry->irq);
irq_set_msi_desc(entry->irq, NULL);
irq_dispose_mapping(entry->irq);
+ entry->irq = 0;
msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, 1);
}
}

--
Without deviation from the norm, progress is not possible.