Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

From: Andy Lutomirski
Date: Wed Jun 10 2020 - 20:15:50 EST


On Wed, Jun 10, 2020 at 5:00 PM Sean Christopherson
<sean.j.christopherson@xxxxxxxxx> wrote:
>
> On Wed, Jun 10, 2020 at 02:59:19PM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Jun 10, 2020, at 11:21 AM, David P. Reed <dpreed@xxxxxxxxxxxx> wrote:
> > >
> > > ïIf a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
> > > done on all CPUS, to allow the INIT IPI to function, since
> > > INIT is suppressed when CPUs are in VMX root operation.
> > > However, VMXOFF causes an undefined operation fault if the CPU is not
> > > in VMX operation, that is, VMXON has not been executed, or VMXOFF
> > > has been executed, but VMX is enabled.
> >
> > Iâm surprised. Wouldnât this mean that emergency reboots always fail it a VM
> > is running? I would think someone would have noticed before.
>
> The call to cpu_vmxoff() is conditioned on CR4.VMXE==1, which KVM toggles in
> tandem with VMXON and VMXOFF. Out of tree hypervisors presumably do the
> same. That's obviously not atomic though, e.g. VMXOFF will #UD if the
> vmxoff_nmi() NMI arrives between CR4.VMXE=1 and VMXON, or between VMXOFF
> and CR4.VMXE=0.

It would be nice for the commit message to say "this happens when
nmxoff_nmi() races with KVM's VMXON/VMXOFF toggling". Or the commit
message should say something else if the bug happens for a different
reason.

The race with KVM should be quite unusual, since it involves rebooting
concurrently with loading or unloading KVM.