Re: [PATCH 00/14] KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs

From: Vitaly Kuznetsov
Date: Tue Jun 28 2022 - 12:02:42 EST


Jim Mattson <jmattson@xxxxxxxxxx> writes:

> On Tue, Jun 28, 2022 at 7:04 AM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>>

...

>> Jim Mattson <jmattson@xxxxxxxxxx> writes:
>>
>> > Just checking that this doesn't introduce any backwards-compatibility
>> > issues. That is, all features that were reported as being available in
>> > the past should still be available moving forward.
>> >
>>
>> All the controls nested_vmx_setup_ctls_msrs() set are in the newly
>> introduced KVM_REQ_VMX_*/KVM_OPT_VMX_* sets so we should be good here
>> (unless I screwed up, of course).
>>
>> There's going to be some changes though. E.g this series was started by
>> Anirudh's report when KVM was exposing SECONDARY_EXEC_TSC_SCALING while
>> running on KVM and using eVMCS which doesn't support the control. This
>> is a bug and I don't think we need and 'bug compatibility' here.
>
> You cannot force VM termination on a kernel upgrade. On live migration
> from an older kernel, the new kernel must be willing to accept the
> suspended state of a VM that was running under the older kernel. In
> particular, the new KVM_SET_MSRS must accept the values of the VMX
> capability MSRS that userspace obtains from the older KVM_GET_MSRS. I
> don't know if this is what you are referring to as "bug
> compatibility," but if it is, then we absolutely do need it.
>

Oh, right you are, we do seem to have a problem. Even for eVMCS case,
the fact that we expose a feature which can't be used in VMX control
MSRs doesn't mean that the VM is broken. In particular, the VM may not
be using VMX features at all. Same goes to PERF_GLOBAL_CTRL errata.

vmx_restore_control_msr() currenly does strict checking of the supplied
data against what was initially set by nested_vmx_setup_ctls_msrs(),
this basically means we cannot drop feature bits, just add them. Out of
top of my head I don't see a solution other than relaxing the check by
introducing a "revoke list"... Another questions is whether we want
guest visible MSR value to remain like it was before migration or we can
be brave and clear 'broken' feature bits there (the features are
'broken' so they couldn't be in use, right?). I'm not sure.

Anirudh, the same concern applies to your 'intermediate' patch too.

Smart ideas on what can be done are more than welcome)

--
Vitaly