RE: [PATCH V4 10/11] vfio/pci: Support dynamic MSI-X

From: Tian, Kevin
Date: Fri May 05 2023 - 04:10:54 EST


> From: Chatre, Reinette <reinette.chatre@xxxxxxxxx>
> Sent: Saturday, April 29, 2023 2:35 AM
>
> Hi Kevin,
>
> On 4/27/2023 11:50 PM, Tian, Kevin wrote:
> >> From: Chatre, Reinette <reinette.chatre@xxxxxxxxx>
> >> Sent: Friday, April 28, 2023 1:36 AM
> >>
> >> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be
> >> allocated after MSI-X enabling.
> >>
> >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt
> >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
> >> the time a valid eventfd is assigned. This is different behavior from
> >> a range provided during MSI-X enabling where interrupts are allocated
> >> for the entire range whether a valid eventfd is provided for each
> >> interrupt or not.
> >>
> >> The PCI-MSIX API requires that some number of irqs are allocated for
> >> an initial set of vectors when enabling MSI-X on the device. When
> >> dynamic MSIX allocation is not supported, the vector table, and thus
> >> the allocated irq set can only be resized by disabling and re-enabling
> >> MSI-X with a different range. In that case the irq allocation is
> >> essentially a cache for configuring vectors within the previously
> >> allocated vector range. When dynamic MSI-X allocation is supported,
> >> the API still requires some initial set of irqs to be allocated, but
> >> also supports allocating and freeing specific irq vectors both
> >> within and beyond the initially allocated range.
> >>
> >> For consistency between modes, as well as to reduce latency and improve
> >> reliability of allocations, and also simplicity, this implementation
> >> only releases irqs via pci_free_irq_vectors() when either the interrupt
> >> mode changes or the device is released.
> >
> > It improves the reliability of allocations from the calling device p.o.v.
> >
> > But system-wide this is not efficient use of irqs and not releasing them
> > timely may affect the reliability of allocations for other devices.
>
> Could you please elaborate how other devices may be impacted?

the more this devices reserves the less remains for others, e.g. irte entries.

>
> > Should this behavior be something configurable?
>
> This is not clear to me and I look to you for guidance here. From practical
> side it looks like configuration via module parameters is supported but
> whether it should be done is not clear to me.
>
> When considering this we need to think about what the user may expect
> when
> turning on/off the configuration. For example, MSI-X continues to allocate a
> range of interrupts during enabling. These have always been treated as a
> "cache" (interrupts remain allocated, whether they have an associated
> trigger
> or not). If there is new configurable behavior, do you expect that the
> driver needs to distinguish between the original "cache" that the user is
> used to and the new dynamic allocations? That is, should a dynamic MSI-X
> capable device always free interrupts when user space removes an eventfd
> or should only interrupts that were allocated dynamically be freed
> dynamically?

That looks tricky. Probably that is why Alex suggested doing this simple
scheme and it is on par with the old logic anyway. So I'll withdraw this
comment.

>
> >> +/*
> >> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector.
> >> + * If a Linux IRQ number is not available then a new interrupt will be
> >> + * allocated if dynamic MSI-X is supported.
> >> + */
> >> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
> >> + unsigned int vector, bool msix)
> >> +{
> >> + struct pci_dev *pdev = vdev->pdev;
> >> + struct msi_map map;
> >> + int irq;
> >> + u16 cmd;
> >> +
> >> + irq = pci_irq_vector(pdev, vector);
> >> + if (irq > 0 || !msix || !vdev->has_dyn_msix)
> >> + return irq;
> >
> > if (irq >= 0 || ...)
> >
>
> I am not sure about this request because pci_irq_vector() cannot return 0.
> The Linux interrupt number will be > 0 on success. 0 means "not found"
> (see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector().
>

There is a subtle difference between the description and the code of
pci_irq_vector().

/**
* pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
* @dev: the PCI device to operate on
* @nr: device-relative interrupt vector index (0-based); has different
* meanings, depending on interrupt mode:
*
* * MSI-X the index in the MSI-X vector table
* * MSI the index of the enabled MSI vectors
* * INTx must be 0
*
* Return: the Linux IRQ number, or -EINVAL if @nr is out of range
*/

From above '0' is a valid irq number.

then in following code:

irq = msi_get_virq(&dev->dev, nr);
return irq ? irq : -EINVAL;

'0' is obviously invalid for msi.

I didn't realize the msi part when reading the patch. It left me in
confusion that '0' is unhandled as here we only check ">0" while in
other places "-EINVAL" is checked.

Not big matter but it sounds slightly clearer to me to follow the
description of pci_irq_vector() instead of its internal detail.