Re: [PATCH v1] KVM: nVMX: Fix handling triple fault on RSM instruction

From: Wilczynski, Michal
Date: Wed Jan 03 2024 - 18:03:32 EST




On 1/2/2024 8:57 PM, Sean Christopherson wrote:
>>
>> Additionally, while the proposed code fixes VMX specific issue, SVM also
>> might suffer from similar problem as it also uses it's own
>> nested_run_pending variable.
>>
>> Reported-by: Zheyu Ma <zheyuma97@xxxxxxxxx>
>> Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@xxxxxxxxxxxxxx
>
> Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")

Thanks !

>
>> Signed-off-by: Michal Wilczynski <michal.wilczynski@xxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx/nested.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index c5ec0ef51ff7..44432e19eea6 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -4904,7 +4904,16 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>>
>> static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
>> {
>> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>> +
>> + /* In case of a triple fault, cancel the nested reentry. This may occur
>
> /*
> * Multi-line comments should look like this. Blah blah blab blah blah
> * blah blah blah blah.
> */

Sorry, didn't notice, and checkpatch didn't complain. In other
subsystems e.g. networking this is not enforced. I will make sure to
remember about this next time.

>
>> + * when the RSM instruction fails while attempting to restore the state
>> + * from SMRAM.
>> + */
>> + vmx->nested.nested_run_pending = 0;
>
> Argh. KVM's handling of SMIs while L2 is active is complete garbage. As explained
> by the comment in vmx_enter_smm(), the L2<->SMM transitions should have a completely
> custom flow and not piggyback/usurp nested VM-Exit/VM-Entry.
>
> /*
> * TODO: Implement custom flows for forcing the vCPU out/in of L2 on
> * SMI and RSM. Using the common VM-Exit + VM-Enter routines is wrong
> * SMI and RSM only modify state that is saved and restored via SMRAM.
> * E.g. most MSRs are left untouched, but many are modified by VM-Exit
> * and VM-Enter, and thus L2's values may be corrupted on SMI+RSM.
> */

I noticed this while working on the issue, and I would be very
interested to take this task and implement custom flows mentioned. Hope
you're fine with this.


> As a stop gap, something like this patch is not awful, though I would strongly
> prefer to be more precise and not clear it on all triple faults. We've had KVM
> bugs where KVM prematurely synthesizes triple fault on an actual nested VM-Enter,
> and those would be covered up by this fix.
>
> But due to nested_run_pending being (unnecessarily) buried in vendor structs, it
> might actually be easier to do a cleaner fix. E.g. add yet another flag to track
> that a hardware VM-Enter needs to be completed in order to complete instruction
> emulation.

Sounds like a good idea. I will experiment with that approach.

>
> And as alluded to above, there's another bug lurking. Events that are *emulated*
> by KVM must not be emulated until KVM knows the vCPU is at an instruction boundary.
> Specifically, enter_smm() shouldn't be invoked while KVM is in the middle of
> instruction emulation (even if "emulation" is just setting registers and skipping
> the instruction). Theoretically, that could be fixed by honoring the existing
> at_instruction_boundary flag for SMIs, but that'd be a rather large change and
> at_instruction_boundary is nowhere near accurate enough to use right now.
>
> Anyways, before we do anything, I'd like to get Maxim's input on what exactly was
> addressed by 759cbd59674a.

Thank you very much for such a comprehensive review! I've learned a lot.
Will try to help with the mentioned problems.

Michał