Re: [PATCH RT] vfio-pci: Set MSI/MSI-X ISR to non-threaded

From: Steven Rostedt
Date: Thu Nov 30 2017 - 21:09:10 EST


On Thu, 30 Nov 2017 17:05:35 -0800
David Su <david.w.su@xxxxxxxxx> wrote:

> Setting MSI/MSI-X ISR to be non-threaded will result in shorter and more
> deterministic IRQ delivery latencies to VFIO applications, because
> context switches to the ISR thread are eliminated. This is important
> for applications with low latency requirement running in virtual
> machines on RT Linux host with assigned devices through vfio-pci.
>
> A FPGA based interrupt testing device was used to compare latencies with
> threaded and non-threaded vfio-pci ISR. The device has a free running
> time stamp counter and a register recording the time an interrupt was
> sent to the host. With these registers the device driver and test
> application for the device are able to calculate and record the latency
> between the time an interrupt was sent and the time the ISR in the
> device's driver was invoked.
>
> The result is with non-threaded vfio-pci ISR the average latency is
> reduced by about 54% and the maximum-minimum latency range is reduced by
> about 65%.
>
> Non-threaded vfio-pci ISR:
> Minimum 4.18us, Average 4.47us, Maximum 10.26us
>
> Threaded vfio-pci ISR:
> Minimum 8.97us, Average 9.65us, Maximum 26.11us
>
> Signed-off-by: David Su <david.w.su@xxxxxxxxx>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1c46045..4c54e56 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -333,7 +333,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> pci_write_msi_msg(irq, &msg);
> }
>
> - ret = request_irq(irq, vfio_msihandler, 0,
> + ret = request_irq(irq, vfio_msihandler, IRQF_NO_THREAD,
> vdev->ctx[vector].name, trigger);

Hmm, but we have this:
static irqreturn_t vfio_msihandler(int irq, void *arg)
{
struct eventfd_ctx *trigger = arg;

eventfd_signal(trigger, 1);
return IRQ_HANDLED;
}

__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
{
unsigned long flags;

spin_lock_irqsave(&ctx->wqh.lock, flags);
if (ULLONG_MAX - ctx->count < n)
n = ULLONG_MAX - ctx->count;
ctx->count += n;
if (waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, POLLIN);
spin_unlock_irqrestore(&ctx->wqh.lock, flags);

return n;
}

And spin_lock() turns into a mutex in PREEMPT_RT, which means it can
sleep. You can't sleep in hard interrupt context. This will eventually
crash the kernel.

And no, we are not going to convert the ctx->wqh.lock into a
raw_spin_lock.

-- Steve

> if (ret) {
> kfree(vdev->ctx[vector].name);