Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification

From: Chao Gao
Date: Fri Jun 30 2023 - 06:28:20 EST


On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>
>On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> > On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>> > > On Thu, Jun 08, 2023, Weijiang Yang wrote:
>> > > > On 6/6/2023 5:08 PM, Chao Gao wrote:
>> > > > > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>> > > > > > Add handling for Control Protection (#CP) exceptions(vector 21).
>> > > > > > The new vector is introduced for Intel's Control-Flow Enforcement
>> > > > > > Technology (CET) relevant violation cases.
>> > > > > >
>> > > > > > Although #CP belongs contributory exception class, but the actual
>> > > > > > effect is conditional on CET being exposed to guest. If CET is not
>> > > > > > available to guest, #CP falls back to non-contributory and doesn't
>> > > > > > have an error code.
>> > > > > This sounds weird. is this the hardware behavior? If yes, could you
>> > > > > point us to where this behavior is documented?
>> > > > It's not SDM documented behavior.
>> > > The #CP behavior needs to be documented. Please pester whoever you need to in
>> > > order to make that happen.
>> > Do you mean documentation for #CP as an generic exception or the behavior in
>> > KVM as this patch shows?
>> As I pointed out two *years* ago, this entry in the SDM
>>
>> — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>> holds: (1) the interruption type is hardware exception; (2) bit 0
>> (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>> (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>> indicates one of the following exceptions: #DF (vector 8), #TS (10),
>> #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>
>> needs to read something like
>>
>> — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>> holds: (1) the interruption type is hardware exception; (2) bit 0
>> (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>> (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>> indicates one of the following exceptions: #DF (vector 8), #TS (10),
>> #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>>
>> [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>> support for the 1-setting of CR4.CET.
>
>Hi, Sean,
>
>I sent above change request to Gil(added in cc), but he shared different
>opinion on this issue:
>
>
>"It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>
> However, there were earlier parts without CET that enumerated
>IA32_VMX_BASIC[56] as 0.
>
> On those parts, an attempt to inject an exception with vector 21 (#CP) with
>an error code would fail.
>
>(Injection of exception 21 with no error code would be allowed.)
>
> It may make things clearer if we document the statement above (all
>CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>
>I will see if we can update future revisions of the SDM to clarify this."
>
>
>Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>inject exception to nested VM.

And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

>
>And this patch could be removed, instead need another patch like below:
>
>diff --git a/arch/x86/include/asm/msr-index.h
>b/arch/x86/include/asm/msr-index.h
>index ad35355ee43e..6b33aacc8587 100644
>--- a/arch/x86/include/asm/msr-index.h
>+++ b/arch/x86/include/asm/msr-index.h
>@@ -1076,6 +1076,7 @@
> #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
> #define VMX_BASIC_MEM_TYPE_WB    6LLU
> #define VMX_BASIC_INOUT        0x0040000000000000LLU
>+#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>
> /* Resctrl MSRs: */
> /* - Intel: */
>diff --git a/arch/x86/kvm/vmx/capabilities.h
>b/arch/x86/kvm/vmx/capabilities.h
>index 85cffeae7f10..4b1ed4dc03bc 100644
>--- a/arch/x86/kvm/vmx/capabilities.h
>+++ b/arch/x86/kvm/vmx/capabilities.h
>@@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>     return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
> }
>
>+static inline bool cpu_has_vmx_basic_check_errcode(void)
>+{
>+    return    (((u64)vmcs_config.basic_cap << 32) &
>VMX_BASIC_CHECK_ERRCODE);
>+}
>+
> static inline bool cpu_has_virtual_nmis(void)
> {
>     return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
>diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>index 78524daa2cb2..92aa4fc3d233 100644
>--- a/arch/x86/kvm/vmx/nested.c
>+++ b/arch/x86/kvm/vmx/nested.c
>@@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx,
>u64 data)
> {
>     const u64 feature_and_reserved =
>         /* feature (except bit 48; see below) */
>-        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>+        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>         /* reserved */
>-        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>+        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>     u64 vmx_basic = vmcs_config.nested.basic;
>
>     if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
>@@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>kvm_vcpu *vcpu,
>         should_have_error_code =
>             intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>             x86_exception_has_error_code(vector);
>-        if (CC(has_error_code != should_have_error_code))
>+        if (!cpu_has_vmx_basic_check_errcode() &&

We can skip computing should_have_error_code. and we should check if
IA32_VMX_BASIC[56] is set for this vCPU (i.e. in vmx->nested.msrs.basic)
rather than host/kvm capability.

>+            CC(has_error_code != should_have_error_code))
>             return -EINVAL;
>
>         /* VM-entry exception error code */
>@@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct
>nested_vmx_msrs *msrs)
>
>     if (cpu_has_vmx_basic_inout())
>         msrs->basic |= VMX_BASIC_INOUT;
>+    if (cpu_has_vmx_basic_check_errcode())
>+        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
> }
>
> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index d70f2e94b187..95c0eab7805c 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config
>*vmcs_conf,
>     rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>
>     vmcs_conf->size = vmx_msr_high & 0x1fff;
>-    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>+    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>
>     vmcs_conf->revision_id = vmx_msr_low;
>
>