Re: [PATCH] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps

From: Sean Christopherson
Date: Mon May 22 2023 - 12:37:56 EST


On Mon, May 22, 2023, Dave Hansen wrote:
> On 5/18/23 10:33, Sean Christopherson wrote:
> >
> > 2. I'm pretty sure conditioning mmio_stale_data_clear on kvm_arch_has_assigned_device()
> > is a bug. AIUI, the vulnerability applies to _any_ MMIO accesses. Assigning
> > a device is necessary to let the device DMA into the guest, but it's not
> > necessary to let the guest access MMIO addresses, that's done purely via
> > memslots.
>
> Just to make sure we're all on the same page: KVM needs mitigations when
> real, hardware MMIO is exposed to the guest. None of this has anything
> to do with virtio or what guests _normally_ see as devices or MMIO. Right?

Yes. I try to always call MMIO that is handled by a synthetic/virtual/emulated
device "emulated MMIO", specifically to differentiate between the two cases.

> But direct device assignment does that "real hardware MMIO" for sure
> because it's mapping parts of the PCI address space (which is all MMIO)
> into the guest. That's what the kvm_arch_has_assigned_device() check
> was going for.
>
> But I think you're also saying that, in the end, memory gets exposed to
> the guest by KVM userspace setting up a memslot. KVM userspace _could_
> have mapped a piece of MMIO and could just pass that down to a guest
> without kvm_arch_has_assigned_device() being involved. That makes the
> kvm_arch_has_assigned_device() useless.

Yep.

> In other words, all guests with kvm_arch_has_assigned_device() need
> mitigation.

Yes, assuming the guest wants to actually use the device :-)

> But there are potentially situations where the guest can see real hardware MMIO
> and yet also be !kvm_arch_has_assigned_device().

Yes. There may or may not be _legitimate_ scenarios for exposing host MMIO to the
guest without an assigned device, but as far as the mitigation is concerned, being
legitimate or not doesn't matter, all that matters is that userspace can expose
host MMIO to the guest irrespective of VFIO.

FWIW, I think this would be a minimal fix without having to apply the mitigation
blindly. My only concern is that there might be gaps in the kvm_is_mmio_pfn()
heuristic, but if that's the case then KVM likely has other issues, e.g. would
potentially map MMIO with the wrong memtype.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2865c3cb3501..ac3c535ae3b9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1274,6 +1274,7 @@ struct kvm_arch {

bool apic_access_memslot_enabled;
bool apic_access_memslot_inhibited;
+ bool vm_has_passthrough_mmio;

/* Protects apicv_inhibit_reasons */
struct rw_semaphore apicv_update_lock;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index cf2c6426a6fc..83d235488e56 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -189,6 +189,10 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (level > PG_LEVEL_4K)
spte |= PT_PAGE_SIZE_MASK;

+ if (static_branch_unlikely(&mmio_stale_data_clear) &&
+ !vcpu->kvm->arch.vm_has_passthrough_mmio && kvm_is_mmio_pfn(pfn))
+ vcpu->kvm->arch.vm_has_passthrough_mmio = true;
+
if (shadow_memtype_mask)
spte |= static_call(kvm_x86_get_mt_mask)(vcpu, gfn,
kvm_is_mmio_pfn(pfn));
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..9c66ba35af92 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7159,7 +7159,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
else if (static_branch_unlikely(&mds_user_clear))
mds_clear_cpu_buffers();
else if (static_branch_unlikely(&mmio_stale_data_clear) &&
- kvm_arch_has_assigned_device(vcpu->kvm))
+ to_kvm_vmx(vcpu->kvm)->vm_has_passthrough_mmio)
mds_clear_cpu_buffers();

vmx_disable_fb_clear(vmx);