RE: Virtualizing MSI-X on IMS via VFIO

From: Thomas Gleixner
Date: Fri Jun 25 2021 - 04:43:16 EST


On Fri, Jun 25 2021 at 05:21, Kevin Tian wrote:
>> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
>> So caching/latching occurs on unmask for MSI-X, but I can't find
>> similar statements for MSI. If you have, please note them. It's
>> possible MSI is per interrupt.
>
> I checked PCI Local Bus Specification rev3.0. At that time MSI and
> MSI-X were described/compared together in almost every paragraph
> in 6.8.3.4 (Per-vector Masking and Function Masking). The paragraph
> that you cited is the last one in that section. It's a pity that MSI is
> not clarified in this paragraph but it gives me the impression that
> MSI function is not permitted to cache address and data values.
> Later after MSI and MSI-X descriptions were split into separate
> sections in PCIe spec, this impression is definitely weakened a lot.
>
> If true, this even implies that software is free to change data/addr
> when MSI is unmasked, which is sort of counter-intuitive to most
> people.

Yes, software is free to do that and it has to deal with the
consequences. See arch/x86/kernel/apic/msi.c::msi_set_affinity().

> Then I further found below thread:
>
> https://lore.kernel.org/lkml/1468426713-31431-1-git-send-email-marc.zyngier@xxxxxxx/
>
> It identified a device which does latch the message content in a
> MSI-capable device, forcing the kernel to startup irq early before
> enabling MSI capability.
>
> So, no answer and let's see whether Thomas can help identify
> a better proof.

As I said to Alex: The MSI specification is and always was blury and the
behaviour in detail is implementation defined. IOW, what might work on
device A is not guaranteed to work on device B.

> p.s. one question to Thomas. As Alex cited above, software must
> not modify the Address, Data, or Steering Tag fields of an MSI-X
> entry while it is unmasked. However this rule might be violated
> today in below flow:
>
> request_irq()
> __setup_irq()
> irq_startup()
> __irq_startup()
> irq_enable()
> unmask_irq() <<<<<<<<<<<<<
> irq_setup_affinity()
> irq_do_set_affinity()
> msi_set_affinity() // when IR is disabled
> irq_msi_update_msg()
> pci_msi_domain_write_msg() <<<<<<<<<<<<<<
>
> Isn't above have msi-x entry updated after it's unmasked?

Dammit, I could swear that we had masking at the core or PCI level at
some point. Let me dig into this.

Thanks,

tglx