Re: [PATCH] x86/kvm: disable fast MMIO when running nested

From: Jason Wang
Date: Thu Jan 25 2018 - 09:39:40 EST




On 2018å01æ25æ 22:16, Radim KrÄmÃÅ wrote:
2018-01-25 01:55-0800, Liran Alon:
----- vkuznets@xxxxxxxxxx wrote:
I was investigating an issue with seabios >= 1.10 which stopped
working
for nested KVM on Hyper-V. The problem appears to be in
handle_ept_violation() function: when we do fast mmio we need to skip
the instruction so we do kvm_skip_emulated_instruction(). This,
however,
depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
However, this is not the case.

Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
EPT MISCONFIG occurs. While on real hardware it was observed to be
set,
some hypervisors follow the spec and don't set it; we end up
advancing
IP with some random value.

I checked with Microsoft and they confirmed they don't fill
VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.

Fix the issue by disabling fast mmio when running nested.

Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
---
arch/x86/kvm/vmx.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..54afb446f38e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
*vcpu)
/*
* A nested guest cannot optimize MMIO vmexits, because we have an
* nGPA here instead of the required GPA.
+ * Skipping instruction below depends on undefined behavior:
Intel's
+ * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
+ * when EPT MISCONFIG occurs and while on real hardware it was
observed
+ * to be set, other hypervisors (namely Hyper-V) don't set it, we
end
+ * up advancing IP with some random value. Disable fast mmio when
+ * running nested and keep it for real hardware in hope that
+ * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
I don't think we should do this on real-hardware as-well.
Neither do I, but you can see the last discussion on this topic,
https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to
limit the hack to real hardware and wait for Intel or virtio changes.

Michael and Jason, any progress on implementing a fast virtio mechanism
that doesn't rely on undefined behavior?

(Encode writing instruction length into last 4 bits of MMIO address,
side-channel say that accesses to the MMIO area always use certain
instruction length, use hypercall, ...)

Thanks.

No progress from my side. But we can use PIO for virtio 1.0 and it's faster than fast MMIO (qemu supports modern pio notification bar, we can make it as default). It looks to me that neither encoding nor hypercall will work for real hardware virtio device.

Thanks