Re: [PATCH V4 10/11] vfio/pci: Support dynamic MSI-X

From: Reinette Chatre
Date: Fri Apr 28 2023 - 14:35:20 EST


Hi Kevin,

On 4/27/2023 11:50 PM, Tian, Kevin wrote:
>> From: Chatre, Reinette <reinette.chatre@xxxxxxxxx>
>> Sent: Friday, April 28, 2023 1:36 AM
>>
>> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be
>> allocated after MSI-X enabling.
>>
>> Use dynamic MSI-X (if supported by the device) to allocate an interrupt
>> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
>> the time a valid eventfd is assigned. This is different behavior from
>> a range provided during MSI-X enabling where interrupts are allocated
>> for the entire range whether a valid eventfd is provided for each
>> interrupt or not.
>>
>> The PCI-MSIX API requires that some number of irqs are allocated for
>> an initial set of vectors when enabling MSI-X on the device. When
>> dynamic MSIX allocation is not supported, the vector table, and thus
>> the allocated irq set can only be resized by disabling and re-enabling
>> MSI-X with a different range. In that case the irq allocation is
>> essentially a cache for configuring vectors within the previously
>> allocated vector range. When dynamic MSI-X allocation is supported,
>> the API still requires some initial set of irqs to be allocated, but
>> also supports allocating and freeing specific irq vectors both
>> within and beyond the initially allocated range.
>>
>> For consistency between modes, as well as to reduce latency and improve
>> reliability of allocations, and also simplicity, this implementation
>> only releases irqs via pci_free_irq_vectors() when either the interrupt
>> mode changes or the device is released.
>
> It improves the reliability of allocations from the calling device p.o.v.
>
> But system-wide this is not efficient use of irqs and not releasing them
> timely may affect the reliability of allocations for other devices.

Could you please elaborate how other devices may be impacted?

> Should this behavior be something configurable?

This is not clear to me and I look to you for guidance here. From practical
side it looks like configuration via module parameters is supported but
whether it should be done is not clear to me.

When considering this we need to think about what the user may expect when
turning on/off the configuration. For example, MSI-X continues to allocate a
range of interrupts during enabling. These have always been treated as a
"cache" (interrupts remain allocated, whether they have an associated trigger
or not). If there is new configurable behavior, do you expect that the
driver needs to distinguish between the original "cache" that the user is
used to and the new dynamic allocations? That is, should a dynamic MSI-X
capable device always free interrupts when user space removes an eventfd
or should only interrupts that were allocated dynamically be freed dynamically?

>> +/*
>> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector.
>> + * If a Linux IRQ number is not available then a new interrupt will be
>> + * allocated if dynamic MSI-X is supported.
>> + */
>> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
>> + unsigned int vector, bool msix)
>> +{
>> + struct pci_dev *pdev = vdev->pdev;
>> + struct msi_map map;
>> + int irq;
>> + u16 cmd;
>> +
>> + irq = pci_irq_vector(pdev, vector);
>> + if (irq > 0 || !msix || !vdev->has_dyn_msix)
>> + return irq;
>
> if (irq >= 0 || ...)
>

I am not sure about this request because pci_irq_vector() cannot return 0.
The Linux interrupt number will be > 0 on success. 0 means "not found"
(see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector().

>> +
>> +/*
>> + * Where is vfio_msi_free_irq() ?
>> + *
>> + * Allocated interrupts are maintained, essentially forming a cache that
>> + * subsequent allocations can draw from. Interrupts are freed using
>> + * pci_free_irq_vectors() when MSI/MSI-X is disabled.
>> + */
>
> Probably merge it with the comment of vfio_msi_alloc_irq()?

Sure, will do.

>
>> @@ -401,6 +430,12 @@ static int vfio_msi_set_vector_signal(struct
>> vfio_pci_core_device *vdev,
>> if (fd < 0)
>> return 0;
>>
>> + if (irq == -EINVAL) {
>> + irq = vfio_msi_alloc_irq(vdev, vector, msix);
>> + if (irq < 0)
>> + return irq;
>> + }
>> +
>> ctx = vfio_irq_ctx_alloc(vdev, vector);
>> if (!ctx)
>> return -ENOMEM;
>
> This doesn't read clean that an irq is allocated but not released
> in the error unwind.

I can add a comment similar to the location where the trigger is released:
/* Interrupt stays allocated, will be freed at MSI-X disable. */

Reinette