RE: [RFC PATCH V3 00/26] vfio/pci: Back guest interrupts from Interrupt Message Store (IMS)

From: Tian, Kevin
Date: Fri Nov 03 2023 - 03:23:25 EST


> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Friday, November 3, 2023 5:14 AM
>
> On Thu, 2 Nov 2023 03:14:09 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
> > > From: Tian, Kevin
> > > Sent: Thursday, November 2, 2023 10:52 AM
> > >
> > > >
> > > > Without an in-tree user of this code, we're just chopping up code for
> > > > no real purpose. There's no reason that a variant driver requiring IMS
> > > > couldn't initially implement their own SET_IRQS ioctl. Doing that
> > >
> > > this is an interesting idea. We haven't seen a real usage which wants
> > > such MSI emulation on IMS for variant drivers. but if the code is
> > > simple enough to demonstrate the 1st user of IMS it might not be
> > > a bad choice. There are additional trap-emulation required in the
> > > device MMIO bar (mostly copying MSI permission entry which contains
> > > PASID info to the corresponding IMS entry). At a glance that area
> > > is 4k-aligned so should be doable.
> > >
> >
> > misread the spec. the MSI-X permission table which provides
> > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st
> > 4k page together with many other registers. emulation of them
> > could be simple with a native read/write handler but not sure
> > whether any of them may sit in a hot path to affect perf due to
> > trap...
>
> I'm not sure if you're referring to a specific device spec or the PCI
> spec, but the PCI spec has long included an implementation note
> suggesting alignment of the MSI-X vector table and pba and separation
> from CSRs, and I see this is now even more strongly worded in the 6.0
> spec.
>
> Note though that for QEMU, these are emulated in the VMM and not
> written through to the device. The result of writes to the vector
> table in the VMM are translated to vector use/unuse operations, which
> we see at the kernel level through SET_IRQS ioctl calls. Are you
> expecting to get PASID information written by the guest through the
> emulated vector table? That would entail something more than a simple
> IMS backend to MSI-X frontend. Thanks,
>

I was referring to IDXD device spec. Basically it allows a process to
submit a descriptor which contains a completion interrupt handle.
The handle is the index of a MSI-X entry or IMS entry allocated by
the idxd driver. To mark the association between application and
related handles the driver records the PASID of the application
in an auxiliary structure for MSI-X (called MSI-X permission table)
or directly in the IMS entry. This additional info includes whether
an MSI-X/IMS entry has PASID enabled and if yes what is the PASID
value to be checked against the descriptor.

As you said virtualizing MSI-X table itself is via SET_IRQS and it's
4k aligned. Then we also need to capture guest updates to the MSI-X
permission table and copy the PASID information into the
corresponding IMS entry when using the IMS backend. It's MSI-X
permission table not 4k aligned then trapping it will affect adjacent
registers.

My quick check in idxd spec doesn't reveal an real impact in perf
critical path. Most registers are configuration/control registers
accessed at driver init time and a few interrupt registers related
to errors or administrative purpose.