Re: [RFC PATCH v2 04/11] KVM: VMX: Add IA32_SPEC_CTRL virtualization support

From: Chenyi Qiang
Date: Mon Apr 17 2023 - 02:49:22 EST




On 4/14/2023 2:25 PM, Chao Gao wrote:
> From: Zhang Chen <chen.zhang@xxxxxxxxx>
>
> Currently KVM disables interception of IA32_SPEC_CTRL after a non-0 is
> written to IA32_SPEC_CTRL by guest. Then, guest is allowed to write any
> value to hardware.
>
> "virtualize IA32_SPEC_CTRL" is a new tertiary vm-exec control. This
> feature allows KVM to specify that certain bits of the IA32_SPEC_CTRL
> MSR cannot be modified by guest software.
>
> Two VMCS fields are added:
>
> IA32_SPEC_CTRL_MASK: bits that guest software cannot modify
> IA32_SPEC_CTRL_SHADOW: value that guest software expects to be in the
> IA32_SPEC_CTRL MSR
>
> On rdmsr, the shadow value is returned. on wrmsr, EDX:EAX is written
> to the IA32_SPEC_CTRL_SHADOW and (cur_val & mask) | (EDX:EAX & ~mask)
> is written to the IA32_SPEC_CTRL MSR, where
> * cur_val is the original value of IA32_SPEC_CTRL MSR
> * mask is the value of IA32_SPEC_CTRL_MASK
>
> Add a mask e.g., loaded_vmcs->spec_ctrl_mask to represent the bits guest
> shouldn't change. It is 0 for now and some bits will be added by
> following patches. Use per-vmcs cache to avoid unnecessary vmcs_write()
> on nested transition because the mask is expected to be rarely changed
> and the same for vmcs01 and vmcs02.
>
> To prevent guest from changing the bits in the mask, enable "virtualize
> IA32_SPEC_CTRL" if supported or emulate its behavior by intercepting
> the IA32_SPEC_CTRL msr. Emulating "virtualize IA32_SPEC_CTRL" behavior
> is mainly to give the same capability to KVM running on potential broken
> hardware or L1 guests.
>
> To avoid L2 evading the enforcement, enable "virtualize IA32_SPEC_CTRL"
> in vmcs02. Always update the guest (shadow) value of IA32_SPEC_CTRL MSR
> and the mask to preserve them across nested transitions. Note that the
> shadow value may be changed because L2 may access the IA32_SPEC_CTRL
> directly and the mask may be changed due to migration when L2 vCPUs are
> running.
>
> Co-developed-by: Chao Gao <chao.gao@xxxxxxxxx>
> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
> Signed-off-by: Zhang Chen <chen.zhang@xxxxxxxxx>
> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>

Duplicated SOB. Move the Co-developed-by down like other patches.

> Tested-by: Jiaan Lu <jiaan.lu@xxxxxxxxx>
> ---
> arch/x86/include/asm/vmx.h | 5 ++++
> arch/x86/include/asm/vmxfeatures.h | 2 ++
> arch/x86/kvm/vmx/capabilities.h | 5 ++++
> arch/x86/kvm/vmx/nested.c | 13 ++++++++++
> arch/x86/kvm/vmx/vmcs.h | 2 ++
> arch/x86/kvm/vmx/vmx.c | 34 ++++++++++++++++++++-----
> arch/x86/kvm/vmx/vmx.h | 40 +++++++++++++++++++++++++++++-
> 7 files changed, 94 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 498dc600bd5c..b9f88ecf20c3 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -81,6 +81,7 @@
> * Definitions of Tertiary Processor-Based VM-Execution Controls.
> */
> #define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT)
> +#define TERTIARY_EXEC_SPEC_CTRL_VIRT VMCS_CONTROL_BIT(SPEC_CTRL_VIRT)
>
> #define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXITING)
> #define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITING)
> @@ -233,6 +234,10 @@ enum vmcs_field {
> TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035,
> PID_POINTER_TABLE = 0x00002042,
> PID_POINTER_TABLE_HIGH = 0x00002043,
> + IA32_SPEC_CTRL_MASK = 0x0000204A,
> + IA32_SPEC_CTRL_MASK_HIGH = 0x0000204B,
> + IA32_SPEC_CTRL_SHADOW = 0x0000204C,
> + IA32_SPEC_CTRL_SHADOW_HIGH = 0x0000204D,
> GUEST_PHYSICAL_ADDRESS = 0x00002400,
> GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401,
> VMCS_LINK_POINTER = 0x00002800,
> diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
> index c6a7eed03914..c70d0769b7d0 100644
> --- a/arch/x86/include/asm/vmxfeatures.h
> +++ b/arch/x86/include/asm/vmxfeatures.h
> @@ -89,4 +89,6 @@
>
> /* Tertiary Processor-Based VM-Execution Controls, word 3 */
> #define VMX_FEATURE_IPI_VIRT ( 3*32+ 4) /* Enable IPI virtualization */
> +#define VMX_FEATURE_SPEC_CTRL_VIRT ( 3*32+ 7) /* Enable IA32_SPEC_CTRL virtualization */
> +
> #endif /* _ASM_X86_VMXFEATURES_H */
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 45162c1bcd8f..a7ab70b55acf 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -138,6 +138,11 @@ static inline bool cpu_has_tertiary_exec_ctrls(void)
> CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
> }
>
> +static __always_inline bool cpu_has_spec_ctrl_virt(void)

Do we need to use __always_inline to force generating inline code? or
just align with other cpu_has_xxx() functions, use inline annotation.

> +{
> + return vmcs_config.cpu_based_3rd_exec_ctrl & TERTIARY_EXEC_SPEC_CTRL_VIRT;
> +}
> +
> static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
> {
> return vmcs_config.cpu_based_2nd_exec_ctrl &