Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

From: Wilczynski, Michal
Date: Thu Feb 08 2024 - 08:21:32 EST

Next message: Huang Yiwei: "[PATCH v5] tracing: Support to dump instance traces by ftrace_dump_on_oops"
Previous message: Will Deacon: "Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize"
In reply to: Paolo Bonzini: "Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction"
Next in thread: Paolo Bonzini: "Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/8/2024 11:31 AM, Paolo Bonzini wrote:
> On Thu, Jan 25, 2024 at 1:59 AM Yunhong Jiang
> <yunhong.jiang@xxxxxxxxxxxxxxx> wrote:
>> Would it be ok to move the followed emulator_leave_smm() code into
>> vmx_leave_smm, before setting nested_run_pending bit? It avoids changing
>> the generic emulator code.
>>
>> #ifdef CONFIG_X86_64
>> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
>> return rsm_load_state_64(ctxt, &smram.smram64);
>> else
>> #endif
>> return rsm_load_state_32(ctxt, &smram.smram32);
>
> I agree with Michal that vendor code should not be in charge of
> calling rsm_load_state_*.
>
> However, I don't understand the problem or the fix.
>
> The possible causes are two, and in neither case the fix is to clear
> nested_run_pending:
>
> 1) if the problem is that setting nested_run_pending was premature,
> the correct fix in my opinion is to split the leave_smm vendor
> callback in "prepare" and "commit" parts; not to clear it later. See
> the attached, untested, patch.

Hi, I've tested the patch and it seems to work, both on Intel and AMD.
There was a problem with applying this chunk though:

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index ac8b7614e79d..3d18fa7db353 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -119,7 +119,8 @@ KVM_X86_OP(setup_mce)
#ifdef CONFIG_KVM_SMM
KVM_X86_OP(smi_allowed)
KVM_X86_OP() // <- This shouldn't be there I guess ?
-KVM_X86_OP(leave_smm)
+KVM_X86_OP(leave_smm_prepare)
+KVM_X86_OP(leave_smm_commit)
KVM_X86_OP(enable_smi_window)
#endif
KVM_X86_OP_OPTIONAL(dev_get_attr)

Anyway I was a bit averse to this approach as I noticed in the git log
that callbacks like e.g post_leave_smm() used to exist, but they were later
removed, so I though the maintainers don't like introducing extra
callbacks.

>
> 2) otherwise, if the problem is that we have not gone through the
> vmenter yet, then KVM needs to do that and _then_ inject the triple
> fault. The fix is to merge the .triple_fault and .check_nested_events
> callbacks, with something like the second attached patch - which
> probably has so many problems that I haven't even tried to compile it.

Well, in this case if we know that RSM will fail it doesn't seem to me
like it make sense to run vmenter just do kill the VM anyway, this would
be more confusing.

I've made the fix this way based on our discussion with Sean in v1, and
tried to mark the RSM instruction with a flag, as a one that needs
actual HW VMenter to complete succesfully, and based on that information
manipulate nested_run_pending.

>
> The first patch should be equivalent to yours, and I guess it is
> okay-ish with a few more comments that explain what's going on.

Sure, I'm fine with this. Thanks !

>
> Paolo

Next message: Huang Yiwei: "[PATCH v5] tracing: Support to dump instance traces by ftrace_dump_on_oops"
Previous message: Will Deacon: "Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize"
In reply to: Paolo Bonzini: "Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction"
Next in thread: Paolo Bonzini: "Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]