Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()

From: Guenter Roeck
Date: Fri Dec 16 2022 - 19:46:00 EST


On 12/16/22 05:58, Marc Zyngier wrote:
[ ... ]

With both these fixes applied, it actually then leads to the very
next WARN_ON failing in msi_ctrl_valid... Because ctrl->last ==
hwsize. I think Thomas' initial fix for msi_domain_get_hwsize has
an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
msi_domain_get_hwsize instead.

Yes, that's a good point, and that's consistent with what
__msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
when info->hwsize is 0. No reason to do something else here.

I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
send it out.

With

7a27b6136dcb (local/testing, testing-msi) genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
c581d525bb1d genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
9d33edb20f7e Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

I still get the following runtime warning.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 8 at kernel/irq/msi.c:196 .msi_domain_free_descs+0x144/0x170
Modules linked in:
CPU: 0 PID: 8 Comm: kworker/u2:0 Tainted: G N 6.1.0-01957-g7a27b6136dcb #1
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
Workqueue: nvme-reset-wq .nvme_reset_work
NIP: c000000000107d54 LR: c000000000107d44 CTR: 0000000000000000
REGS: c0000000041e74d0 TRAP: 0700 Tainted: G N (6.1.0-01957-g7a27b6136dcb)
MSR: 0000000080029002 <CE,EE,ME> CR: 44002282 XER: 20000000
IRQMASK: 0
GPR00: c000000000107d44 c0000000041e7770 c000000001629c00 c000000004e748a0
GPR04: 000000005358db0a c000000001ce7a00 c00000000423b5d0 000000004735aaa2
GPR08: 0000000000000002 0000000000000013 c00000000423acc0 c00000000214a998
GPR12: 0000000024002282 c000000002579000 c00000000008e190 c000000004173540
GPR16: 0000000000000000 c0000000043810b8 0000000000000000 0000000000000001
GPR20: c0000000060b22d8 c0000000060a70f0 0000000000000000 c000000001996800
GPR24: c0000000017df6c0 c0000000043810b8 c0000000060b2388 c0000000060b2000
GPR28: ffffffffffffffff c0000000041e7888 c000000006025ac8 c000000004e748a0
NIP [c000000000107d54] .msi_domain_free_descs+0x144/0x170
LR [c000000000107d44] .msi_domain_free_descs+0x134/0x170
Call Trace:
[c0000000041e7770] [c000000000107d44] .msi_domain_free_descs+0x134/0x170 (unreliable)
[c0000000041e7810] [c0000000001085d8] .msi_domain_free_msi_descs_range+0x38/0x70
[c0000000041e78a0] [c0000000008d000c] .pci_msi_teardown_msi_irqs+0x4c/0xa0
[c0000000041e7920] [c0000000008cf9e8] .pci_free_msi_irqs+0x18/0x50
[c0000000041e79a0] [c0000000008cd8d0] .pci_free_irq_vectors+0x80/0xb0
[c0000000041e7a20] [c000000000a6d2a0] .nvme_reset_work+0x870/0x1780
[c0000000041e7bb0] [c000000000080e68] .process_one_work+0x2d8/0x7b0
[c0000000041e7c90] [c0000000000813d8] .worker_thread+0x98/0x4f0
[c0000000041e7d70] [c00000000008e2cc] .kthread+0x13c/0x150
[c0000000041e7e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
7fc3f378 48ff1ca9 60000000 7c7f1b79 41c2002c e8810070 7fc3f378 48ff3491
60000000 813f0000 2c090000 41e2ffb0 <0fe00000> 60000000 60000000 ebc10090
irq event stamp: 98168
hardirqs last enabled at (98167): [<c00000000110a274>] ._raw_spin_unlock_irqrestore+0x84/0xd0
hardirqs last disabled at (98168): [<c000000000010b58>] .program_check_exception+0x38/0x120
softirqs last enabled at (97760): [<c00000000110b4dc>] .__do_softirq+0x60c/0x678
softirqs last disabled at (97749): [<c000000000004d20>] .do_softirq_own_stack+0x30/0x50
---[ end trace 0000000000000000 ]---
nvme nvme0: 1/0/0 default/read/poll queues
nvme nvme0: Ignoring bogus Namespace Identifiers
...

The system boots fine, though. This is seen when booting the ppce500 machine with
e5500 CPU and corenet64_smp_defconfig from nvme.

Guenter