Re: [PATCH 2/2] KVM: nVMX: fix for disappearing L1->L2 event injection on L1 migration

From: Maxim Levitsky
Date: Thu Jan 07 2021 - 04:43:11 EST


On Thu, 2021-01-07 at 04:38 +0200, Maxim Levitsky wrote:
> On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> > On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > > If migration happens while L2 entry with an injected event to L2 is pending,
> > > we weren't including the event in the migration state and it would be
> > > lost leading to L2 hang.
> >
> > But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING
> > should be set in the migration state, i.e. it should naturally be copied to
> > vmcs02 and thus (re)injected by vmx_set_nested_state(). Is nested_run_pending
> > not set? Is the info in vmcs12 somehow lost? Or am I off in left field...
>
> You are completely right.
> The injected event can be copied like that since the vmc(b|s)12 is migrated.
>
> We can safely disregard both these two patches and the parallel two patches for SVM.
> I am almost sure that the real root cause of this bug was that we
> weren't restoring the nested run pending flag, and I even
> happened to fix this in this patch series.
>
> This is the trace of the bug (I removed the timestamps to make it easier to read)
>
>
> kvm_exit: vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
> kvm_nested_vmrun: rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on
> ^^^ this is the injection
> kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercepts: bc4c8027 00006e7f 00000000
> kvm_fpu: unload
> kvm_userspace_exit: reason KVM_EXIT_INTR (10)
>
> ============================================================================
> migration happens here
> ============================================================================
>
> ...
> kvm_async_pf_ready: token 0xffffffff gva 0
> kvm_apic_accept_irq: apicid 0 vec 243 (Fixed|edge)
>
> kvm_nested_intr_vmexit: rip: 0x000000000000fff0
>
> ^^^^^ this is the nested vmexit that shouldn't have happened, since nested run is pending,
> and which erased the eventinj field which was migrated correctly just like you say.
>
> kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> ...
>
>
> We did notice that this vmexit had a wierd RIP and I
> even explained this later to myself,
> that this is the default RIP which we put to vmcb,
> and it wasn't yet updated, since it updates just prior to vm entry.
>
> My test already survived about 170 iterations (usually it crashes after 20-40 iterations)
> I am leaving the stress test running all night, let see if it survives.

And after leaving it overnight, the test survived about 1000 iterations.

Thanks again!

Best regards,
Maxim Levitstky


>
> V2 of the patches is on the way.
>
> Thanks again for the help!
>
> Best regards,
> Maxim Levitsky
>
> >
> > > Fix this by queueing the injected event in similar manner to how we queue
> > > interrupted injections.
> > >
> > > This can be reproduced by running an IO intense task in L2,
> > > and repeatedly migrating the L1.
> > >
> > > Suggested-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > > ---
> > > arch/x86/kvm/vmx/nested.c | 12 ++++++------
> > > 1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index e2f26564a12de..2ea0bb14f385f 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> > > * Interrupt/Exception Fields
> > > */
> > > if (vmx->nested.nested_run_pending) {
> > > - vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> > > - vmcs12->vm_entry_intr_info_field);
> > > - vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> > > - vmcs12->vm_entry_exception_error_code);
> > > - vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> > > - vmcs12->vm_entry_instruction_len);
> > > + if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
> > > + vmx_process_injected_event(&vmx->vcpu,
> > > + vmcs12->vm_entry_intr_info_field,
> > > + vmcs12->vm_entry_instruction_len,
> > > + vmcs12->vm_entry_exception_error_code);
> > > +
> > > vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
> > > vmcs12->guest_interruptibility_info);
> > > vmx->loaded_vmcs->nmi_known_unmasked =
> > > --
> > > 2.26.2
> > >