[PATCH 0/2] Support multiple interrupts for virtio over MMIO devices

From: Jakub Sitnicki
Date: Fri Sep 29 2023 - 16:46:37 EST


# Intro

This patch set enables virtio-mmio devices to use multiple interrupts.

The elevator pitch would be:

"""
To keep the complexity down to a minimum, but at the same time get to the
same performance level as virtio-pci devices, we:

1) keep using the legacy interrupts, and
2) have a predefined, device type specific, mapping of IRQs to virtqueues,
3) rely on vhost offload for both data and notifications (irqfd/ioeventfd).
"""

As this is an RFC, we aim to (i) present our use-case, and (ii) get an idea
if we are going in the right direction.

Otherwise we have kept changes down to a working minimum, where we can
already demonstrate the performance benefits.

At this point, we did not:
- draft any change proposals to VIRTIO spec, or
- added support for other virtio-mmio driver "configuration backends" than
the kernel command line (that is ACPI and DT).

# Motivation

This work aims to enable lightweight VMs (like QEMU microvm, Firecracker,
Cloud Hypervisor), which rely on virtio MMIO transport, to utilize
multi-queue virtio NIC to their full potential when multiple vCPUs are
available.

Currently with MMIO transport, it is not possible to process vNIC queue
events in parallel because there is just one interrupt perf virtio-mmio
device, and hence one CPU handling processing the virtqueue events.

We are looking to change that, so that the vNIC performance (measured in
pps) scales together with the number of vNIC queues, and allocated vCPUs.

Our goal is to reach the same pps level as virtio-pci vNIC delivers today.

# Prior Work

So far we have seen two attempts making virtio-mmio devices use multiple
IRQs. First in 2014 [1], then in 2020 [2]. At least that is all we could
find.

Gathering from discussions and review feedback, the pitfalls in the
previous submissions were:

1. lack of proof that there are performance benefits (see [1]),
2. code complexity (see [2]),
3. no reference VMM (QEMU) implementation ([1] and [2]).

We try not to repeat these mistakes.

[1] https://lore.kernel.org/r/1415093712-15156-1-git-send-email-zhaoshenglong@xxxxxxxxxx/
[2] https://lore.kernel.org/r/cover.1581305609.git.zhabin@xxxxxxxxxxxxxxxxx/

# Benchmark Setup and Results

Traffic flow:

host -> guest (reflect in XDP native) -> host

host-guest-host with XDP program reflecting UDP packets is just one of
production use-cases we are interested in.

Another one is a typical host-to-guest scenario, where UDP flows are
terminated in the guest. The latter, however, takes more work to benchmark
because it requires manual sender throttling to avoid very high loses on
receiver.

Setup details:

- guest:
- Linux v6.5 + this patchset
- 8 vCPUs
- 16 vNIC queues (8 in use + 8 for lockless XDP TX)
- host
- VMM - QEMU v8.1.0 + PoC changes (see below)
- vhost offload enabled
- iperf3 v3.12 used as sender and receiver
- traffic pattern
- 8 uni-directional, small-packet UDP flows
- flow steering - one flow per vNIC RX queue
- CPU affinity
- iperf clients, iperfs servers, KVM vCPU threads, vhost threads pinned to
their own logical CPUs
- all used logical CPUs on the same NUMA node

Recorded receiver pps:

virtio-pci virtio-mmio virtio-mmio
8+8+1 IRQs 8 IRQs 1 IRQ

rx pps (mean ± rsd): 217,743 ± 2.4% 221,741 ± 2.7% 48,910 ± 0.03%
pkt loss (min … max): 1.8% … 2.3% 2.9% … 3.6% 82.1% … 89.3%

rx pps is the average over 8 receivers, each receiving one UDP flow.
pkt loss is not aggregated. Loss for each UDP flow is within the range.

If anyone would like to reproduce these results, we would be happy to share
detailed setup steps and tooling (scripts).

# PoC QEMU changes

QEMU is the only known to us VMM where we can compare the performance of
both virtio PCI and MMIO transport with a multi-queue virtio NIC and vhost
offload.

Hence, accompanying this patches, we also have a rather raw, and not yet
review ready, QEMU code changes that we used to test and benchmark
virtio-mmio device with multiple IRQs.

The tag with changes is available at:

https://github.com/jsitnicki/qemu/commits/virtio-mmio-multi-irq-rfc1

# Open Questions

- Do we need a feature flag, for example VIRTIO_F_MULTI_IRQ, for the guest to
inform the VMM that it understands the feature?

Or can we assume that VMM assigns multiple IRQs to virtio-mmio device only if
guest is compatible?

Looking forward to your feedback.

Jakub Sitnicki (2):
virtio-mmio: Parse a range of IRQ numbers passed on the command line
virtio-mmio: Support multiple interrupts per device

drivers/virtio/virtio_mmio.c | 179 ++++++++++++++++++++++++++++++++-----------
1 file changed, 135 insertions(+), 44 deletions(-)