Re: [PATCH 15/17] vfio/pci: Let enable and disable of interrupt types use same signature

From: Reinette Chatre
Date: Wed Feb 14 2024 - 14:37:25 EST


Hi Alex,

Apologies for the delay. This was due to two parts:
* I studied these code paths more and I think there may be one more
flow that can be made more robust (more later),
* I spent a lot of time trying to trigger all the affected code paths but
here I did not have much luck. I would prefer to run tests that trigger
these flows so that I can get some help from the kernel lock debugging.
From what I can tell the irqfd flows can be triggered with the help of the
kernel-irqchip option to Qemu but on the hardware I have access to I
could only try kernel-irqchip=auto and that did not trigger the flows
(the irqfd is set up but the eventfd is never signaled).
I am not familiar with this area, do you perhaps have guidance on how
I can test these flows or do you perhaps have access to needed environments
to test this?

On 2/8/2024 1:08 PM, Alex Williamson wrote:
> On Wed, 7 Feb 2024 15:30:15 -0800
> Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote:

>> I studied the code more and have one more observation related to this portion
>> of the flow:
>> From what I can tell this change makes the INTx code more robust. If I
>> understand current implementation correctly it seems possible to enable
>> INTx but not have interrupt allocated. In this case the interrupt context
>> (ctx) will exist but ctx->trigger will be NULL. Current
>> vfio_pci_set_intx_trigger()->vfio_send_intx_eventfd() only checks if
>> ctx is valid. It looks like it may call eventfd_signal(NULL) where
>> pointer is dereferenced.
>>
>> If this is correct then I think a separate fix that can easily be
>> backported may be needed. Something like:
>
> Good find. I think it's a bit more complicated though. There are
> several paths to vfio_send_intx_eventfd:
>
> - vfio_intx_handler
>
> This can only be called between request_irq() and free_irq()
> where trigger is always valid. igate is not held.
>
> - vfio_pci_intx_unmask
>
> Callers hold igate, additional test of ctx->trigger makes this
> safe.

Two callers of vfio_pci_intx_unmask() do not seem to hold igate:
vfio_basic_config_write() and vfio_pci_core_runtime_resume().

Considering this I wonder if we could add something like below to the
solution you propose. On a high level the outside callers (VFIO PCI core)
will keep using vfio_pci_intx_unmask() that will now take igate while
the interrupt management code gets a new internal __vfio_pci_intx_unmask()
that should be called with igate held. This results in:

@@ -215,12 +223,20 @@ static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
return ret;
}

-void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
+static void __vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
{
+ lockdep_assert_held(&vdev->igate);
if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
vfio_send_intx_eventfd(vdev, NULL);
}

+void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
+{
+ mutex_lock(&vdev->igate);
+ __vfio_pci_intx_unmask(vdev);
+ mutex_unlock(&vdev->igate);
+}
+
static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
{
struct vfio_pci_core_device *vdev = dev_id;
@@ -581,11 +597,11 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
return -EINVAL;

if (flags & VFIO_IRQ_SET_DATA_NONE) {
- vfio_pci_intx_unmask(vdev);
+ __vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t unmask = *(uint8_t *)data;
if (unmask)
- vfio_pci_intx_unmask(vdev);
+ __vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
int32_t fd = *(int32_t *)data;

>
> - vfio_pci_set_intx_trigger
>
> Same as above.
>
> - Through unmask eventfd (virqfd)
>
> Here be dragons.
>
> In the virqfd case, a write to the eventfd calls virqfd_wakeup() where
> we'll call the handler, vfio_pci_intx_unmask_handler(), and based on
> the result schedule the thread, vfio_send_intx_eventfd(). Both of
> these look suspicious. They're not called under igate, so testing
> ctx->trigger doesn't resolve the race.
>
> I think an option is to wrap the virqfd entry points in igate where we
> can then do something similar to your suggestion. I don't think we
> want to WARN_ON(!ctx->trigger) because that's then a user reachable
> condition. Instead we can just quietly follow the same exit paths.
>
> I think that means we end up with something like this:
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 237beac83809..ace7e1dbc607 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -92,12 +92,21 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
> struct vfio_pci_irq_ctx *ctx;
>
> ctx = vfio_irq_ctx_get(vdev, 0);
> - if (WARN_ON_ONCE(!ctx))
> + if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
> return;
> eventfd_signal(ctx->trigger);
> }
> }
>
> +static void vfio_send_intx_eventfd_virqfd(void *opaque, void *unused)
> +{
> + struct vfio_pci_core_device *vdev = opaque;
> +
> + mutex_lock(&vdev->igate);
> + vfio_send_intx_eventfd(opaque, unused);
> + mutex_unlock(&vdev->igate);
> +}
> +
> /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
> bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
> {
> @@ -170,7 +179,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
> }
>
> ctx = vfio_irq_ctx_get(vdev, 0);
> - if (WARN_ON_ONCE(!ctx))
> + if (WARN_ON_ONCE(!ctx) || !ctx->trigger)
> goto out_unlock;
>
> if (ctx->masked && !vdev->virq_disabled) {
> @@ -194,6 +203,18 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
> return ret;
> }
>
> +static int vfio_pci_intx_unmask_handler_virqfd(void *opaque, void *unused)
> +{
> + struct vfio_pci_core_device *vdev = opaque;
> + int ret;
> +
> + mutex_lock(&vdev->igate);
> + ret = vfio_pci_intx_unmask_handler(opaque, unused);
> + mutex_unlock(&vdev->igate);
> +
> + return ret;
> +}
> +
> void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
> {
> if (vfio_pci_intx_unmask_handler(vdev, NULL) > 0)
> @@ -572,10 +593,10 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
> if (WARN_ON_ONCE(!ctx))
> return -EINVAL;
> if (fd >= 0)
> - return vfio_virqfd_enable((void *) vdev,
> - vfio_pci_intx_unmask_handler,
> - vfio_send_intx_eventfd, NULL,
> - &ctx->unmask, fd);
> + return vfio_virqfd_enable((void *)vdev,
> + vfio_pci_intx_unmask_handler_virqfd,
> + vfio_send_intx_eventfd_virqfd, NULL,
> + &ctx->unmask, fd);
>
> vfio_virqfd_disable(&ctx->unmask);
> }
>
>
> WDYT?

This looks good to me. Thank you very much for taking the time to
write it.

Reinette