Re: [PATCH V3 09/10] vfio/pci: Support dynamic MSI-X

From: Reinette Chatre
Date: Wed Apr 19 2023 - 18:04:16 EST


Hi Alex,

On 4/19/2023 2:38 PM, Alex Williamson wrote:
> On Wed, 19 Apr 2023 11:13:29 -0700
> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>> On 4/18/2023 3:38 PM, Alex Williamson wrote:
>>> On Tue, 18 Apr 2023 10:29:20 -0700
>>> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:
>>>
>>>> Recently introduced pci_msix_alloc_irq_at() and pci_msix_free_irq()
>>>> enables an individual MSI-X interrupt to be allocated and freed after
>>>> MSI-X enabling.
>>>>
>>>> Use dynamic MSI-X (if supported by the device) to allocate an interrupt
>>>> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
>>>> the time a valid eventfd is assigned. This is different behavior from
>>>> a range provided during MSI-X enabling where interrupts are allocated
>>>> for the entire range whether a valid eventfd is provided for each
>>>> interrupt or not.
>>>>
>>>> Do not dynamically free interrupts, leave that to when MSI-X is
>>>> disabled.
>>>
>>> But we do, sometimes, even if essentially only on the error path. Is
>>> that worthwhile? It seems like we could entirely remove
>>> vfio_msi_free_irq() and rely only on pci_free_irq_vectors() on MSI/X
>>> teardown.
>>
>> Yes, it is only on the error path where dynamic MSI-X interrupts are
>> removed. I do not know how to determine if it is worthwhile. On the
>> kernel side failure seems unlikely since it would mean memory cannot
>> be allocated or request_irq() failed. In these cases it may not be
>> worthwhile since user space may try again and having the interrupt
>> already allocated would be helpful. The remaining error seems to be
>> if user space provided an invalid eventfd. An allocation in response
>> to wrong user input is a concern to me. Should we consider
>> buggy/malicious users? I am uncertain here so would defer to your
>> guidance.
>
> I don't really see that a malicious user can exploit anything here,
> their irq allocation is bound by the device support and they're
> entitled to make use of the full vector set of the device by virtue of
> having ownership of the device. All the MSI-X allocated irqs are freed
> when the interrupt mode is changed or the device is released regardless.
>
> The end result is also no different than if the user had not configured
> the vector when enabling MSI-X or configured it and later de-configured
> with a -1 eventfd. The irq is allocated but not attached to a ctx.
> We're intentionally using this as a cache.
>
> Also, as implemented here in v3, we might be freeing from the original
> allocation rather than a new, dynamically allocated irq.

Great point.

>
> My thinking is that if we think there's a benefit to caching any
> allocated irqs, we should do so consistently. We don't currently know
> if the irq was allocated now or previously. Tracking that would add
> complexity for little benefit. The user can get to the same end result
> of an allocated, unused irq in numerous way, the state itself is not
> erroneous, and is actually in support of caching irq allocations.
> Removing the de-allocation of a single vector further simplifies the
> code as there exists only one path where irqs are freed, ie.
> pci_free_irq_vectors().
>
> So I'd lean towards removing vfio_msi_free_irq().

Thank you for your detailed analysis. I understand and agree.
I will remove vfio_msi_free_irq().

>
>>> I'd probably also add a comment in the commit log about the theory
>>> behind not dynamically freeing irqs, ie. latency, reliability, and
>>> whatever else we used to justify it. Thanks,
>>
>> Sure. How about something like below to replace the final sentence of
>> the changelog:
>>
>> "When a guest disables an interrupt, user space (Qemu) does not
>> disable the interrupt but instead assigns it a different trigger. A
>> common flow is thus for the VFIO_DEVICE_SET_IRQS ioctl() to be
>> used to assign a new eventfd to an already enabled interrupt. Freeing
>> and re-allocating an interrupt in this scenario would add unnecessary
>> latency as well as uncertainty since the re-allocation may fail. Do
>> not dynamically free interrupts when an interrupt is disabled, instead
>> support a subsequent re-enable to draw from the initial allocation when
>> possible. Leave freeing of interrupts to when MSI-X is disabled."
>
> There are other means besides caching irqs that could achieve the above,
> for instance if a trigger is simply swapped from one eventfd to another,
> that all happens within vfio_msi_set_vector_signal() where we could
> hold the irq for the transition.
>
> I think I might justify it as:
>
> The PCI-MSIX API requires that some number of irqs are
> allocated for an initial set of vectors when enabling MSI-X on
> the device. When dynamic MSIX allocation is not supported, the
> vector table, and thus the allocated irq set can only be resized
> by disabling and re-enabling MSIX with a different range. In
> that case the irq allocation is essentially a cache for
> configuring vectors within the previously allocated vector
> range. When dynamic MSIX allocation is supported, the API
> still requires some initial set of irqs to be allocated, but
> also supports allocating and freeing specific irq vectors both
> within and beyond the initially allocated range.
>
> For consistency between modes, as well as to reduce latency and
> improve reliability of allocations, and also simplicity, this
> implementation only releases irqs via pci_free_irq_vectors()
> when either the interrupt mode changes or the device is
> released.
>
> Does that cover the key points for someone that might want to revisit
> this decision later? Thanks,

It does so clearly, yes. Thank you so much for taking the time to
write this. I will include it in the changelog.

Reinette