Re: [PATCH v2] KVM: VMX: Enable Notify VM exit

From: Xiaoyao Li
Date: Tue Sep 07 2021 - 09:45:20 EST


On 9/3/2021 12:36 AM, Sean Christopherson wrote:
On Thu, Sep 02, 2021, Sean Christopherson wrote:
On Tue, Aug 03, 2021, Xiaoyao Li wrote:
On 8/2/2021 11:46 PM, Sean Christopherson wrote:
@@ -5642,6 +5653,31 @@ static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu)
return 0;
}
+static int handle_notify(struct kvm_vcpu *vcpu)
+{
+ unsigned long exit_qual = vmx_get_exit_qual(vcpu);
+
+ if (!(exit_qual & NOTIFY_VM_CONTEXT_INVALID)) {

What does CONTEXT_INVALID mean? The ISE doesn't provide any information whatsoever.

It means whether the VM context is corrupted and not valid in the VMCS.

Well that's a bit terrifying. Under what conditions can the VM context become
corrupted? E.g. if the context can be corrupted by an inopportune NOTIFY exit,
then KVM needs to be ultra conservative as a false positive could be fatal to a
guest.


Short answer is no case will set the VM_CONTEXT_INVALID bit.

But something must set it, otherwise it wouldn't exist.

For existing Intel silicon, no case will set it. Maybe in the future new case will set it.

The condition(s) under
which it can be set matters because it affects how KVM should respond. E.g. if
the guest can trigger VM_CONTEXT_INVALID at will, then we should probably treat
it as a shutdown and reset the VMCS.

Oh, and "shutdown" would be relative to the VMCS, i.e. if L2 triggers a NOTIFY
exit with VM_CONTEXT_INVALID then KVM shouldn't kill the entire VM. The least
awful option would probably be to synthesize a shutdown VM-Exit to L1. That
won't communicate to L1 that vmcs12 state is stale/bogus, but I don't see any way
to handle that via an existing VM-Exit reason :-/

But if VM_CONTEXT_INVALID can occur if and only if there's a hardware/ucode
issue, then we can do:

if (KVM_BUG_ON(exit_qual & NOTIFY_VM_CONTEXT_INVALID, vcpu->kvm))
return -EIO;

Either way, to enable this by default we need some form of documentation that
describes what conditions lead to VM_CONTEXT_INVALID.

I still don't know why the conditions lead to it matters. I think the consensus is that once VM_CONTEXT_INVALID happens, the vcpu can no longer run. Either KVM_BUG_ON() or a specific EXIT to userspace should be OK?