[ANNOUNCE] PUCK Notes - 2024.01.24 - Memtypes for non-coherent DMA

From: Sean Christopherson
Date: Wed Jan 24 2024 - 13:25:00 EST


Recording and slides:

https://drive.google.com/corp/drive/folders/18QbkitOXcZyYXpT558wXf9Hs-rQs8mhX?resourcekey=0-qOuxyhLUBUGlCwHrzPqAkQ

Key Takeways:

- Intel CPU<->CPU accesses are coherent for guest WC/UC accesses, so KVM can
honor guest PAT for all VMs without putting the host or guest at risk. I.e.
KVM x86 doesn't need new uAPI, we can simply delete the IPAT code.

- Intel CPUs need an SFENCE after VM-Exit, but there's already an smp_mb()
buried in srcu_read_lock(), and KVM uses SRCU to protect memslots, i.e. an
SFENCE is guaranteed before KVM (or userspace) will access guest memory after
VM-Exit. TODO: add and use smp_mb__after_srcu_read_lock() to pair with
smp_mb__after_srcu_read_unlock() and document the need for a barrier on Intel.

- IOMMU (via VFIO/IOMMUFD) mappings need cache flush operations on map() and
unmap() to prevent the guest from using non-coherent DMA to read stale data
on x86 (and likely other architectures).

- ARM's architecture doesn't guarantee coherency for mismatched memtypes, so
KVM still needs to figure out a solution for ARM, and possibly RISC-V as
well. But for CPU<->CPU access, KVM guarantees host safety, just not
functional correctness for the guest, i.e. finding a solution can likely be
deferred until a use case comes along.

- CPU<->Device coherency on ARM is messy and needs further discussion.

- GPU drivers flush caches when mapping and unmapping buffers, so the existing
virtio GPU use case is ok (though ideally it would be ported to use IOMMUFD's
mediated device support).

- Virtio GPU guest drivers are responsible for using CLFLUSH{OPT} instead of
WBVIND (which is intercept and ignored by KVM).

- KVM x86's support for virtualizing MTRRs on Intel CPUs can also be dropped
(it was effectively a workaround for ignoring guest PAT).