Re: [PATCH 2/9] KVM: x86: Emulate triple fault shutdown if RSM emulation fails

From: Vitaly Kuznetsov
Date: Thu Jun 10 2021 - 04:26:26 EST


Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> Use the recently introduced KVM_REQ_TRIPLE_FAULT to properly emulate
> shutdown if RSM from SMM fails.
>
> Note, entering shutdown after clearing the SMM flag and restoring NMI
> blocking is architecturally correct with respect to AMD's APM, which KVM
> also uses for SMRAM layout and RSM NMI blocking behavior. The APM says:
>
> An RSM causes a processor shutdown if an invalid-state condition is
> found in the SMRAM state-save area. Only an external reset, external
> processor-initialization, or non-maskable external interrupt (NMI) can
> cause the processor to leave the shutdown state.
>
> Of note is processor-initialization (INIT) as a valid shutdown wake
> event, as INIT is blocked by SMM, implying that entering shutdown also
> forces the CPU out of SMM.
>
> For recent Intel CPUs, restoring NMI blocking is technically wrong, but
> so is restoring NMI blocking in the first place, and Intel's RSM
> "architecture" is such a mess that just about anything is allowed and can
> be justified as micro-architectural behavior.
>
> Per the SDM:
>
> On Pentium 4 and later processors, shutdown will inhibit INTR and A20M
> but will not change any of the other inhibits. On these processors,
> NMIs will be inhibited if no action is taken in the SMI handler to
> uninhibit them (see Section 34.8).
>
> where Section 34.8 says:
>
> When the processor enters SMM while executing an NMI handler, the
> processor saves the SMRAM state save map but does not save the
> attribute to keep NMI interrupts disabled. Potentially, an NMI could be
> latched (while in SMM or upon exit) and serviced upon exit of SMM even
> though the previous NMI handler has still not completed.
>
> I.e. RSM unconditionally unblocks NMI, but shutdown on RSM does not,
> which is in direct contradiction of KVM's behavior. But, as mentioned
> above, KVM follows AMD architecture and restores NMI blocking on RSM, so
> that micro-architectural detail is already lost.
>
> And for Pentium era CPUs, SMI# can break shutdown, meaning that at least
> some Intel CPUs fully leave SMM when entering shutdown:
>
> In the shutdown state, Intel processors stop executing instructions
> until a RESET#, INIT# or NMI# is asserted. While Pentium family
> processors recognize the SMI# signal in shutdown state, P6 family and
> Intel486 processors do not.
>
> In other words, the fact that Intel CPUs have implemented the two
> extremes gives KVM carte blanche when it comes to honoring Intel's
> architecture for handling shutdown during RSM.
>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/kvm/emulate.c | 12 +++++++-----
> arch/x86/kvm/kvm_emulate.h | 1 +
> arch/x86/kvm/x86.c | 6 ++++++
> 3 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 5e5de05a8fbf..0603a2c79093 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2683,7 +2683,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
> * state-save area.
> */
> if (ctxt->ops->pre_leave_smm(ctxt, buf))
> - return X86EMUL_UNHANDLEABLE;
> + goto emulate_shutdown;
>
> #ifdef CONFIG_X86_64
> if (emulator_has_longmode(ctxt))
> @@ -2692,14 +2692,16 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
> #endif
> ret = rsm_load_state_32(ctxt, buf);
>
> - if (ret != X86EMUL_CONTINUE) {
> - /* FIXME: should triple fault */
> - return X86EMUL_UNHANDLEABLE;
> - }
> + if (ret != X86EMUL_CONTINUE)
> + goto emulate_shutdown;
>
> ctxt->ops->post_leave_smm(ctxt);
>
> return X86EMUL_CONTINUE;
> +
> +emulate_shutdown:
> + ctxt->ops->triple_fault(ctxt);
> + return X86EMUL_UNHANDLEABLE;

I'm probably missing something, but what's the desired effect of both
raising KVM_REQ_TRIPLE_FAULT and returning X86EMUL_UNHANDLEABLE here?

I've modified smm selftest to see what's happening:

diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index 613c42c5a9b8..cf215cd2c6e2 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -147,6 +147,11 @@ int main(int argc, char *argv[])
"Unexpected stage: #%x, got %x",
stage, stage_reported);

+ if (stage_reported == SMRAM_STAGE) {
+ /* corrupt smram */
+ memset(addr_gpa2hva(vm, SMRAM_GPA) + 0xfe00, 0xff, 512);
+ }
+
state = vcpu_save_state(vm, VCPU_ID);
kvm_vm_release(vm);
kvm_vm_restart(vm, O_RDWR);

What I see is:

smm_test-7600 [002] 4497.073918: kvm_exit: reason EXIT_RSM rip 0x8004 info 0 0
smm_test-7600 [002] 4497.073921: kvm_emulate_insn: 1000000:8004: 0f aa
smm_test-7600 [002] 4497.073924: kvm_smm_transition: vcpu 1: leaving SMM, smbase 0x1000000
smm_test-7600 [002] 4497.073928: kvm_emulate_insn: 0:8004: 0f aa FAIL
smm_test-7600 [002] 4497.073929: kvm_fpu: unload
smm_test-7600 [002] 4497.073930: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17)

If I change X86EMUL_UNHANDLEABLE to X86EMUL_CONTINUE tripple fault is
happening indeed (why don't we have triple fault printed in trace by
default BTW???):

smm_test-16810 [006] 5117.007220: kvm_exit: reason EXIT_RSM rip 0x8004 info 0 0
smm_test-16810 [006] 5117.007222: kvm_emulate_insn: 1000000:8004: 0f aa
smm_test-16810 [006] 5117.007225: kvm_smm_transition: vcpu 1: leaving SMM, smbase 0x1000000
smm_test-16810 [006] 5117.007229: bputs: vcpu_enter_guest: KVM_REQ_TRIPLE_FAULT
smm_test-16810 [006] 5117.007230: kvm_fpu: unload
smm_test-16810 [006] 5117.007230: kvm_userspace_exit: reason KVM_EXIT_SHUTDOWN (8)

So should we actually have X86EMUL_CONTINUE when we queue
KVM_REQ_TRIPLE_FAULT here?

(Initially, my comment was supposed to be 'why don't you add
TRIPLE_FAULT to smm selftest?' but the above overshadows it)

> }
>
> static void
> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> index 3e870bf9ca4d..9c34aa60e45f 100644
> --- a/arch/x86/kvm/kvm_emulate.h
> +++ b/arch/x86/kvm/kvm_emulate.h
> @@ -233,6 +233,7 @@ struct x86_emulate_ops {
> int (*pre_leave_smm)(struct x86_emulate_ctxt *ctxt,
> const char *smstate);
> void (*post_leave_smm)(struct x86_emulate_ctxt *ctxt);
> + void (*triple_fault)(struct x86_emulate_ctxt *ctxt);
> int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr);
> };
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 54d212fe9b15..cda148cf06fa 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7123,6 +7123,11 @@ static void emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt)
> kvm_smm_changed(emul_to_vcpu(ctxt));
> }
>
> +static void emulator_triple_fault(struct x86_emulate_ctxt *ctxt)
> +{
> + kvm_make_request(KVM_REQ_TRIPLE_FAULT, emul_to_vcpu(ctxt));
> +}
> +
> static int emulator_set_xcr(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr)
> {
> return __kvm_set_xcr(emul_to_vcpu(ctxt), index, xcr);
> @@ -7172,6 +7177,7 @@ static const struct x86_emulate_ops emulate_ops = {
> .set_hflags = emulator_set_hflags,
> .pre_leave_smm = emulator_pre_leave_smm,
> .post_leave_smm = emulator_post_leave_smm,
> + .triple_fault = emulator_triple_fault,
> .set_xcr = emulator_set_xcr,
> };

--
Vitaly