Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

From: Sean Christopherson
Date: Fri Oct 09 2020 - 00:04:59 EST


On Thu, Oct 08, 2020 at 09:18:18PM +0300, stsp wrote:
> 08.10.2020 20:59, Sean Christopherson пишет:
> >On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
> >>07.10.2020 04:44, Sean Christopherson пишет:
> >>>Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
> >>Hi Sean & KVM devs.
> >>
> >>I tested the patches, and wherever I
> >>set VMXE in CR4, I now get
> >>KVM: KVM_SET_SREGS: Invalid argument
> >>Before the patch I was able (with many
> >>problems, but still) to set VMXE sometimes.
> >>
> >>So its a NAK so far, waiting for an update. :)
> >IIRC, you said you were going to test on AMD? Assuming that's correct,
>
> Yes, that is true.
>
>
> > -EINVAL
> >is the expected behavior. KVM was essentially lying before; it never actually
> >set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
> >was set in KVM's shadow of the guest's CR4.
>
> Hmm. But at least it was lying
> similarly on AMD and Intel CPUs. :)
> So I was able to reproduce the problems
> myself.
> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> tests only?

For anything VMXE related, yes.

> Then additional question.
> On old Intel CPUs we needed to set VMXE in guest to make it to work in
> nested-guest mode.
> Is it still needed even with your patches?
> Or the nested-guest mode will work now even on older Intel CPUs and KVM will
> set VMXE for us itself, when needed?

I'm struggling to even come up with a theory as to how setting VMXE from
userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
anything.

CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
the guest. But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
the guest's actual value (the guest sees a shadow value when it reads CR4).

And unless I grossly misunderstand dosemu2, it's not doing anything related to
nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
should have absolutely zero impact.

More than likely, VMXE was a red herring. Given that the reporter is also
seeing the same bug on bare metal after moving to kernel 5.4, odds are good
the issue is related to unrestricted_guest=n and has nothing to do with nVMX.