Re: [RFC PATCH v2 04/11] KVM: VMX: Add IA32_SPEC_CTRL virtualization support

From: Binbin Wu
Date: Sun Apr 16 2023 - 23:17:48 EST



On 4/14/2023 2:25 PM, Chao Gao wrote:
From: Zhang Chen <chen.zhang@xxxxxxxxx>

Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
written to IA32_SPEC_CTRL by guest. Then, guest is allowed to write any
value to hardware.

"virtualize IA32_SPEC_CTRL" is a new tertiary vm-exec control. This
feature allows KVM to specify that certain bits of the IA32_SPEC_CTRL
MSR cannot be modified by guest software.

Two VMCS fields are added:

IA32_SPEC_CTRL_MASK: bits that guest software cannot modify
IA32_SPEC_CTRL_SHADOW: value that guest software expects to be in the
IA32_SPEC_CTRL MSR

On rdmsr, the shadow value is returned. on wrmsr, EDX:EAX is written
to the IA32_SPEC_CTRL_SHADOW and (cur_val & mask) | (EDX:EAX & ~mask)
is written to the IA32_SPEC_CTRL MSR, where
* cur_val is the original value of IA32_SPEC_CTRL MSR
* mask is the value of IA32_SPEC_CTRL_MASK

Add a mask e.g.,

e.g. or i.e. ?


loaded_vmcs->spec_ctrl_mask to represent the bits guest
shouldn't change. It is 0 for now and some bits will be added by
following patches. Use per-vmcs cache to avoid unnecessary vmcs_write()
on nested transition because the mask is expected to be rarely changed
and the same for vmcs01 and vmcs02.

To prevent guest from changing the bits in the mask, enable "virtualize
IA32_SPEC_CTRL" if supported or emulate its behavior by intercepting
the IA32_SPEC_CTRL msr. Emulating "virtualize IA32_SPEC_CTRL" behavior
is mainly to give the same capability to KVM running on potential broken
hardware or L1 guests.

To avoid L2 evading the enforcement, enable "virtualize IA32_SPEC_CTRL"
in vmcs02. Always update the guest (shadow) value of IA32_SPEC_CTRL MSR
and the mask to preserve them across nested transitions. Note that the
shadow value may be changed because L2 may access the IA32_SPEC_CTRL
directly and the mask may be changed due to migration when L2 vCPUs are
running.

Co-developed-by: Chao Gao <chao.gao@xxxxxxxxx>
Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
Signed-off-by: Zhang Chen <chen.zhang@xxxxxxxxx>
Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
Tested-by: Jiaan Lu <jiaan.lu@xxxxxxxxx>
---
arch/x86/include/asm/vmx.h | 5 ++++
arch/x86/include/asm/vmxfeatures.h | 2 ++
arch/x86/kvm/vmx/capabilities.h | 5 ++++
arch/x86/kvm/vmx/nested.c | 13 ++++++++++
arch/x86/kvm/vmx/vmcs.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++-----
arch/x86/kvm/vmx/vmx.h | 40 +++++++++++++++++++++++++++++-
7 files changed, 94 insertions(+), 7 deletions(-)

[...]

@@ -750,4 +766,26 @@ static inline bool guest_cpuid_has_evmcs(struct kvm_vcpu *vcpu)
to_vmx(vcpu)->nested.enlightened_vmcs_enabled;
}
+static inline u64 vmx_get_guest_spec_ctrl(struct vcpu_vmx *vmx)
+{
+ return vmx->guest_spec_ctrl;
+}
+
+static inline void vmx_set_guest_spec_ctrl(struct vcpu_vmx *vmx, u64 val)
+{
+ vmx->guest_spec_ctrl = val;
+
+ /*
+ * For simplicity, always keep IA32_SPEC_CTRL_SHADOW up-to-date,
+ * regardless of the MSR intercept state.

It is better to use "IA32_SPEC_CTRL"  explicitly instead of "the MSR" to avoid misunderstand.


+ */
+ if (cpu_has_spec_ctrl_virt())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, val);
+
+ /*
+ * Update the effective value of IA32_SPEC_CTRL to reflect changes to
+ * guest's IA32_SPEC_CTRL. Bits in the mask should always be set.
+ */
+ vmx->spec_ctrl = val | vmx_get_spec_ctrl_mask(vmx);
+}
#endif /* __KVM_X86_VMX_H */