Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on VMware hypervisor

From: Vitaly Kuznetsov
Date: Wed Sep 07 2022 - 11:20:16 EST


Ajay Kaher <akaher@xxxxxxxxxx> writes:

> During boot-time there are many PCI config reads, these could be performed
> either using Port IO instructions (PIO) or memory mapped I/O (MMIO).
>
> PIO are less efficient than MMIO, they require twice as many PCI accesses
> and PIO instructions are serializing. As a result, MMIO should be preferred
> when possible over PIO.
>
> Virtual Machine test result using VMware hypervisor
> 1 hundred thousand reads using raw_pci_read() took:
> PIO: 12.809 seconds
> MMIO: 8.517 seconds (~33.5% faster then PIO)
>
> Currently, when these reads are performed by a virtual machine, they all
> cause a VM-exit, and therefore each one of them induces a considerable
> overhead.
>
> This overhead can be further improved, by mapping MMIO region of virtual
> machine to memory area that holds the values that the “emulated hardware”
> is supposed to return. The memory region is mapped as "read-only” in the
> NPT/EPT, so reads from these regions would be treated as regular memory
> reads. Writes would still be trapped and emulated by the hypervisor.
>
> Virtual Machine test result with above changes in VMware hypervisor
> 1 hundred thousand read using raw_pci_read() took:
> PIO: 12.809 seconds
> MMIO: 0.010 seconds
>
> This helps to reduce virtual machine PCI scan and initialization time by
> ~65%. In our case it reduced to ~18 mSec from ~55 mSec.
>
> MMIO is also faster than PIO on bare-metal systems, but due to some bugs
> with legacy hardware and the smaller gains on bare-metal, it seems prudent
> not to change bare-metal behavior.

Out of curiosity, are we sure MMIO *always* works for other hypervisors
besides Vmware? Various Hyper-V version can probably be tested (were
they?) but with KVM it's much harder as PCI is emulated in VMM and
there's certainly more than 1 in existence...

>
> Signed-off-by: Ajay Kaher <akaher@xxxxxxxxxx>
> ---
> v1 -> v2:
> Limit changes to apply only to VMs [Matthew W.]
> ---
> arch/x86/pci/common.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index ddb7986..1e5a8f7 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -20,6 +20,7 @@
> #include <asm/pci_x86.h>
> #include <asm/setup.h>
> #include <asm/irqdomain.h>
> +#include <asm/hypervisor.h>
>
> unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
> PCI_PROBE_MMCONF;
> @@ -57,14 +58,58 @@ int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> return -EINVAL;
> }
>
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +static int vm_raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 *val)
> +{
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> + return -EINVAL;
> +}
> +
> +static int vm_raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 val)
> +{
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> + return -EINVAL;
> +}

These look exactly like raw_pci_read()/raw_pci_write() but with inverted
priority. We could've added a parameter but to be more flexible, I'd
suggest we add a 'priority' field to 'struct pci_raw_ops' and make
raw_pci_read()/raw_pci_write() check it before deciding what to use
first. To be on the safe side, you can leave raw_pci_ops's priority
higher than raw_pci_ext_ops's by default and only tweak it in
arch/x86/kernel/cpu/vmware.c

> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
> static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
> {
> +#ifdef CONFIG_HYPERVISOR_GUEST
> + /*
> + * MMIO is faster than PIO, but due to some bugs with legacy
> + * hardware, it seems prudent to prefer MMIO for VMs and PIO
> + * for bare-metal.
> + */
> + if (!hypervisor_is_type(X86_HYPER_NATIVE))
> + return vm_raw_pci_read(pci_domain_nr(bus), bus->number,
> + devfn, where, size, value);
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
> return raw_pci_read(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
> }
>
> static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
> {
> +#ifdef CONFIG_HYPERVISOR_GUEST
> + /*
> + * MMIO is faster than PIO, but due to some bugs with legacy
> + * hardware, it seems prudent to prefer MMIO for VMs and PIO
> + * for bare-metal.
> + */
> + if (!hypervisor_is_type(X86_HYPER_NATIVE))
> + return vm_raw_pci_write(pci_domain_nr(bus), bus->number,
> + devfn, where, size, value);
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
> return raw_pci_write(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
> }

--
Vitaly