RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()

From: Tian, Kevin
Date: Thu Dec 09 2021 - 07:31:13 EST

Next message: Mark Brown: "Re: linux-next: Tree for Dec 8"
Previous message: Andy Shevchenko: "[PATCH v1 1/1] percpu_ref: Replace kernel.h with the necessary inclusions"
In reply to: Thomas Gleixner: "RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()"
Next in thread: Jason Gunthorpe: "Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Sent: Thursday, December 9, 2021 4:37 PM
>
> On Thu, Dec 09 2021 at 05:23, Kevin Tian wrote:
> >> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >> I don't see anything wrong with that. A subdevice is it's own entity and
> >> VFIO can chose the most conveniant representation of it to the guest
> >> obviously.
> >>
> >> How that is backed on the host does not really matter. You can expose
> >> MSI-X to the guest with a INTx backing as well.
> >>
> >
> > Agree with this point. How the interrupts are represented to the guest
> > is orthogonal to how the backend resource is allocated. Physically MSI-X
> > and IMS can be enabled simultaneously on an IDXD device. Once
> > dynamic allocation is allowed for both, either one can be allocated for
> > a subdevice (with only difference on supported #subdevices).
> >
> > When an interrupt resource is exposed to the guest with the same type
> > (e.g. MSI-on-MSI or IMS-on-IMS), it can be also passed through to the
> > guest as long as a hypercall machinery is in place to get addr/data pair
> > from the host (as you suggested earlier).
>
> As I pointed out in the conclusion of this thread, IMS is only going to
> be supported with interrupt remapping in place on both host and guest.

I still need to read the last few mails but thanks for pointing it out now.

>
> As these devices are requiring a vIOMMU on the guest anyway (PASID, User
> IO page tables), the required hypercalls are part of the vIOMMU/IR
> implementation. If you look at it from the irqdomain hierarchy view:
>
> |- PCI-MSI
> VECTOR -- [v]IOMMU/IR -|- PCI-MSI-X
> |- PCI-IMS
>
> So host and guest use just the same representation which makes a ton of
> sense.
>
> There are two places where this matters:
>
> 1) The activate() callback of the IR domain
>
> 2) The irq_set_affinity() callback of the irqchip associated with the
> IR domain
>
> Both callbacks are allowed to fail and the error code is handed back to
> the originating call site.
>
> If you look at the above hierarchy view then MSI/MSI-X/IMS are all
> treated in exactly the same way. It all becomes the common case.
>
> No?
>

Yes, I think above makes sense.

For a new guest OS which supports this enlightened hierarchy the same
machinery works for all type of interrupt storages and we have a
failure path from host to guest in case of host-side resource shortage.
And no trap is required on guest access to the interrupt storage.

A legacy guest OS which doesn't support the enlightened hierarchy
can only use MSI/MSI-X which is still trapped. But with vector
reallocation support from your work the situation already improves
a lot than current awkward way in VFIO (free all previous vectors
and then re-allocate).

Overall I think this is a good modeling.

Thanks
Kevin

Next message: Mark Brown: "Re: linux-next: Tree for Dec 8"
Previous message: Andy Shevchenko: "[PATCH v1 1/1] percpu_ref: Replace kernel.h with the necessary inclusions"
In reply to: Thomas Gleixner: "RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()"
Next in thread: Jason Gunthorpe: "Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]