Jason,
CC+ IOMMU folks
On Tue, Nov 30 2021 at 20:17, Jason Gunthorpe wrote:
On Tue, Nov 30, 2021 at 10:23:16PM +0100, Thomas Gleixner wrote:Looking at the device slices as subdevices with their own struct device
The real problem is where to store the MSI descriptors because the PCIEr.. I never realized that just looking at the patches :|
device has its own real PCI/MSI-X interrupts which means it still shares
the storage space.
That is relevant to all real "IMS" users. IDXD escaped this because
it, IMHO, wrongly used the mdev with the IRQ layer. The mdev is purely
a messy artifact of VFIO, it should not be required to make the IRQ
layers work.
I don't think it makes sense that the msi_desc would point to a mdev,
the iommu layer consumes the msi_desc_to_dev(), it really should point
to the physical device that originates the message with a proper
iommu ops/data/etc.
makes a lot of sense from the conceptual level. That makes is pretty
much obvious to manage the MSIs of those devices at this level like we
do for any other device.
Whether mdev is the right encapsulation for these subdevices is an
orthogonal problem.
I surely agree that msi_desc::dev is an interesting question, but we
already have this disconnect of msi_desc::dev and DMA today due to DMA
aliasing. I haven't looked at that in detail yet, but of course the
alias handling is substantially different accross the various IOMMU
implementations.
Though I fear there is also a use case for MSI-X and IMS tied to the
same device. That network card you are talking about might end up using
MSI-X for a control block and then IMS for the actual network queues
when it is used as physical function device as a whole, but that's
conceptually a different case.
Yes, I was thinking about that as well. The trivial way would be:I'm currently tending to partition the index space in the xarray:It is OK, with some xarray work it can be range allocating & reserving
0x00000000 - 0x0000ffff PCI/MSI-X
0x00010000 - 0x0001ffff NTB
so that the msi_domain_alloc_irqs() flows can carve out chunks of the
number space..
Another view is the msi_domain_alloc_irqs() flows should have their
own xarrays..
struct xarray store[MSI_MAX_STORES];
and then have a store index for each allocation domain. With the
proposed encapsulation of the xarray handling that's definitely
feasible. Whether that buys much is a different question. Let me think
about it some more.
I'm glad you like the approach.which is feasible now with the range modifications and way simpler to doIndeed!
with xarray than with the linked list.
Thanks,
tglx