RE: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection

From: Li, Xin3
Date: Sun Nov 13 2022 - 23:10:22 EST




> -----Original Message-----
> From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Sent: Friday, November 11, 2022 3:58 AM
> To: Peter Zijlstra <peterz@xxxxxxxxxxxxx>; H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Christopherson,, Sean <seanjc@xxxxxxxxxx>; Li, Xin3 <xin3.li@xxxxxxxxx>;
> linux-kernel@xxxxxxxxxxxxxxxx; x86@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx;
> dave.hansen@xxxxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>
> Subject: Re: [PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for
> NMI/IRQ reinjection
>
> On 11/11/22 11:45, Peter Zijlstra wrote:
> >> What is "correct" in this context?
> >
> > I don't know since I don't really speak virt, but I could image the
> > regset that would match the vmrun (or whatever intel decided to call
> > that again) instruction.
>
> Right now it is not exactly that but close. The RIP is somewhere in
> vmx_do_interrupt_nmi_irqoff; CS/SS are correct (i.e. it's not like they point to
> guest values!) and other registers including RSP and RFLAGS are consistent with
> the RIP.
>
> >> Currently KVM basically stuff random data into pt_regs; this at least
> >> makes it explicitly zero.
> >
> > 🙁 Both is broken. Once again proving to me that virt is a bunch of
> > duck-tape at best.
>
> Except currently it is not random. At least I'm not alone in sometimes thinking I
> understand stuff when I actually don't.
>
> Zero is just wrong, I agree. Xin, if you don't want to poke at the IDT you need
> to build the pt_regs and pass them to the function you add in patch 5. In order
> to make it like Peter wants it, then:
>
> 1) for the flags just use X86_EFLAGS_FIXED
>
> 2) for CS/SS segment selectors use __KERNEL_CS and __KERNEL_DS
>
> 3) the RIP/RSP can be found respectively in (unsigned long)vmx_vmexit
> vmx->loaded_vmcs->host_state.rsp
>
> 4) the other registers can be taken from vcpu->arch.regs
>
> But I am not sure it's an improvement. It may be okay to zero the registers, but
> setting CS/RIP/SS/RSP/RFLAGS to the actual processor values in
> vmx_do_interrupt_nmi_irqoff is safer.
>
> I'm not sure who would set orig_rax, but I haven't looked too closely.
> Perhaps that's not a problem, but if so it has to be documented in the commit
> message.

No IRQ handler uses a pt_regs structure as an argument, while NMI handlers do.

For NMI based profiling, what it wants maybe is more of the guest RIP?
I might make it over complicated by mixing host/guest profiling.

> Paolo