[PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

From: Michal Wilczynski
Date: Mon Jan 22 2024 - 19:56:03 EST


Syzkaller found a warning triggered in nested_vmx_vmexit().
vmx->nested.nested_run_pending is non-zero, even though we're in
nested_vmx_vmexit(). Generally, trying to cancel a pending entry is
considered a bug. However in this particular scenario, the kernel
behavior seems correct.

Syzkaller scenario:
1) Set up VCPU's
2) Run some code with KVM_RUN in L2 as a nested guest
3) Return from KVM_RUN
4) Inject KVM_SMI into the VCPU
5) Change the EFER register with KVM_SET_SREGS to value 0x2501
6) Run some code on the VCPU using KVM_RUN
7) Observe following behavior:

kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8002
kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
nested_rip: 0x0000000000000000 int_ctl: 0x00000000
event_inj: 0x00000000 nested_ept=n guest
cr3: 0x0000000000002000
kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
ext_inf2: 0x0000000000000000 ext_int: 0x00000000
ext_int_err: 0x00000000

What happened here is an SMI was injected immediately and the handler was
called at address 0x8000; all is good. Later, an RSM instruction is
executed in an emulator to return to the nested VM. em_rsm() is called,
which leads to emulator_leave_smm(). A part of this function calls VMX/SVM
callback, in this case vmx_leave_smm(). It attempts to set up a pending
reentry to guest VM by calling nested_vmx_enter_non_root_mode() and sets
vmx->nested.nested_run_pending to one. Unfortunately, later in
emulator_leave_smm(), rsm_load_state_64() fails to write invalid EFER to
the MSR. This results in em_rsm() calling triple_fault callback. At this
point it's clear that the KVM should call the vmexit, but
vmx->nested.nested_run_pending is left set to 1.

Similar flow goes for SVM, as the bug also reproduces on AMD platforms.

To address this issue, reset the nested_run_pending flag in the
triple_fault handler. However, it's crucial to note that
nested_pending_run cannot be cleared in all cases. It should only be
cleared for the specific instruction requiring hardware VM-Enter to
complete the emulation, such as RSM. Previously, there were instances
where KVM prematurely synthesized a triple fault on nested VM-Enter. In
these cases, it is not appropriate to zero the nested_pending_run.

To resolve this, introduce a new emulator flag indicating the need for
HW VM-Enter to complete emulating RSM. Based on this flag, a decision can
be made in vendor-specific triple fault handlers about whether
nested_pending_run needs to be cleared.

Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")
Reported-by: Zheyu Ma <zheyuma97@xxxxxxxxx>
Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@xxxxxxxxxxxxxx
Signed-off-by: Michal Wilczynski <michal.wilczynski@xxxxxxxxx>
---
v2:
- added new emulator flags indicating whether an instruction needs a
VM-Enter to complete emulation (Sean)
- fix in SVM nested triple_fault handler (Sean)
- only clear nested_run_pending on RSM instruction (Sean)

arch/x86/kvm/emulate.c | 4 +++-
arch/x86/kvm/kvm_emulate.h | 2 ++
arch/x86/kvm/svm/nested.c | 9 +++++++++
arch/x86/kvm/vmx/nested.c | 12 ++++++++++++
4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e223043ef5b2..889460432eac 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,7 @@
#define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
#define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
#define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
+#define NeedVMEnter ((u64)1 << 57) /* Instruction needs HW VM-Enter to complete */

#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)

@@ -4462,7 +4463,7 @@ static const struct opcode twobyte_table[256] = {
F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
/* 0xA8 - 0xAF */
I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
- II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
+ II(EmulateOnUD | ImplicitOps | NeedVMEnter, em_rsm, rsm),
F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
@@ -4966,6 +4967,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
}

ctxt->is_branch = opcode.flags & IsBranch;
+ ctxt->need_vm_enter = opcode.flags & NeedVMEnter;

/* Unrecognised? */
if (ctxt->d == 0)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index e6d149825169..1e1366afa51d 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -372,6 +372,8 @@ struct x86_emulate_ctxt {
struct read_cache io_read;
struct read_cache mem_read;
bool is_branch;
+ /* instruction need a HW VM-Enter to complete correctly */
+ bool need_vm_enter;
};

#define KVM_EMULATOR_BUG_ON(cond, ctxt) \
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dee62362a360..8c19ad5e18d4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1165,11 +1165,20 @@ int nested_svm_vmexit(struct vcpu_svm *svm)

static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
{
+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
struct vcpu_svm *svm = to_svm(vcpu);

if (!vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SHUTDOWN))
return;

+ /*
+ * In case of a triple fault, cancel the nested reentry. This may occur
+ * when the RSM instruction fails while attempting to restore the state
+ * from SMRAM.
+ */
+ if (ctxt->need_vm_enter)
+ svm->nested.nested_run_pending = 0;
+
kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
nested_svm_simple_vmexit(to_svm(vcpu), SVM_EXIT_SHUTDOWN);
}
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6329a306856b..9228699b4c1e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4950,7 +4950,19 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,

static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
{
+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+ /*
+ * In case of a triple fault, cancel the nested reentry. This may occur
+ * when the RSM instruction fails while attempting to restore the state
+ * from SMRAM.
+ */
+ if (ctxt->need_vm_enter)
+ vmx->nested.nested_run_pending = 0;
+
nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
}

--
2.41.0