Re: [PATCH] genirq/msi: Initialize msi_alloc_info to zero for msi_prepare API

From: Zenghui Yu
Date: Sun Dec 20 2020 - 23:50:51 EST


Hi Marc,

On 2020/12/19 1:38, Marc Zyngier wrote:
Hi Zenghui,

On Fri, 18 Dec 2020 06:00:39 +0000,
Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote:

Since commit 5fe71d271df8 ("irqchip/gic-v3-its: Tag ITS device as shared if
allocating for a proxy device"), some of the devices are wrongly marked as
"shared" by the ITS driver on systems equipped with the ITS(es). The
problem is that the @info->flags may not be initialized anywhere and we end
up looking at random bits on the stack. That's obviously not good.

The straightforward fix is to properly initialize msi_alloc_info inside the
.prepare callback of affected MSI domains (its-pci-msi, its-platform-msi,
etc). We can also perform the initialization in IRQ core layer for
msi_domain_prepare_irqs() API and it looks much neater to me.

Signed-off-by: Zenghui Yu <yuzenghui@xxxxxxxxxx>
---

This was noticed when I was playing with the assigned devices on arm64 and
VFIO failed to enable MSI-X vectors for almost all VFs (CCed kvm list in
case others will hit the same issue). It turned out that these VFs are
marked as "shared" by mistake and have trouble with the following sequence:

pci_alloc_irq_vectors(pdev, 1, 1, flag);
pci_free_irq_vectors(pdev);
pci_alloc_irq_vectors(pdev, 1, 2, flag); --> we can only get
*one* vector

But besides VFIO, I guess there are already some devices get into trouble
at probe time and can't work properly.

kernel/irq/msi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2c0c4d6d0f83..dc0e2d7fbdfd 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -402,7 +402,7 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
struct msi_domain_ops *ops = info->ops;
struct irq_data *irq_data;
struct msi_desc *desc;
- msi_alloc_info_t arg;
+ msi_alloc_info_t arg = { };
int i, ret, virq;
bool can_reserve;

Thanks for having investigated this. I guess my only worry with this
is that msi_alloc_info_t is a pretty large structure on x86, and
zeroing it isn't totally free.

It seems that x86 will zero the whole msi_alloc_info_t structure at the
beginning of its .prepare (pci_msi_prepare()/init_irq_alloc_info()). If
this really affects something, I think we can easily address it with
some cleanup (on top of this patch).

But this definitely looks nicer than
some of the alternatives (.prepare isn't a good option, as we do rely
on the flag being set in __platform_msi_create_device_domain(), which
calls itself .prepare).

Indeed, thanks for fixing the commit message.

I'll queue it, and we can always revisit this later if Thomas (or
anyone else) has a better idea.

Thanks!


Zenghui