Re: [PATCH v3] x86/speculation, KVM: only IBPB for switch_mm_always_ibpb on vCPU load

From: Jon Kohler
Date: Thu Apr 28 2022 - 08:53:06 EST




> On Apr 22, 2022, at 12:21 PM, Jon Kohler <jon@xxxxxxxxxxx> wrote:
>
> On vmx_vcpu_load_vmcs and svm_vcpu_load, respect user controlled
> configuration for conditional IBPB and only attempt IBPB MSR when
> switching between different guest vCPUs IFF switch_mm_always_ibpb,
> which fixes a situation where the kernel will issue IBPB
> unconditionally even when conditional IBPB is enabled.
>
> If a user has spectre_v2_user mitigation enabled, in any
> configuration, and the underlying processor supports X86_FEATURE_IBPB,
> X86_FEATURE_USE_IBPB is set and any calls to
> indirect_branch_prediction_barrier() will issue IBPB MSR.
>
> Depending on the spectre_v2_user configuration, either
> switch_mm_always_ibpb key or switch_mm_cond_ibpb key will be set.
>
> Both switch_mm_always_ibpb and switch_mm_cond_ibpb are handled by
> switch_mm() -> cond_mitigation(), which works well in cases where
> switching vCPUs (i.e. switching tasks) also switches mm_struct;
> however, this misses a paranoid case where user space may be running
> multiple guests in a single process (i.e. single mm_struct). This
> presents two issues:
>
> Issue 1:
> This paranoid case is already covered by vmx_vcpu_load_vmcs and
> svm_vcpu_load; however, this is done by calling
> indirect_branch_prediction_barrier() and thus the kernel
> unconditionally issues IBPB if X86_FEATURE_USE_IBPB is set.
>
> Issue 2:
> For a conditional configuration, this paranoid case is nonsensical.
> If userspace runs multiple VMs in the same process, enables cond_ipbp,
> _and_ sets TIF_SPEC_IB, then isn't getting full protection in any case,
> e.g. if userspace is handling an exit-to-userspace condition for two
> vCPUs from different VMs, then the kernel could switch between those
> two vCPUs' tasks without bouncing through KVM and thus without doing
> KVM's IBPB.
>
> Fix both by using intermediary call to x86_virt_guest_switch_ibpb(),
> which gates IBPB MSR IFF switch_mm_always_ibpb is true.
>
> switch_mm_cond_ibpb is intentionally ignored from the KVM code side
> as it really is nonsensical given the common case is already well
> covered by switch_mm(), so issuing an additional IBPB from KVM is
> just pure overhead.
>
> Note: switch_mm_always_ibpb key is user controlled via spectre_v2_user
> and will be true for the following configurations:
> spectre_v2_user=on
> spectre_v2_user=prctl,ibpb
> spectre_v2_user=seccomp,ibpb
>
> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx>
> Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> Cc: Waiman Long <longman@xxxxxxxxxx>
> ---
> v1 -> v2:
> - Addressed comments on approach from Sean.
> v2 -> v3:
> - Updated spec-ctrl.h comments and commit msg to incorporate
> additional feedback from Sean.
>

Gentle ping on this one, thanks, Jon

> arch/x86/include/asm/spec-ctrl.h | 14 ++++++++++++++
> arch/x86/kernel/cpu/bugs.c | 6 +++++-
> arch/x86/kvm/svm/svm.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 4 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
> index 5393babc0598..99d3341d2e21 100644
> --- a/arch/x86/include/asm/spec-ctrl.h
> +++ b/arch/x86/include/asm/spec-ctrl.h
> @@ -85,4 +85,18 @@ static inline void speculative_store_bypass_ht_init(void) { }
> extern void speculation_ctrl_update(unsigned long tif);
> extern void speculation_ctrl_update_current(void);
>
> +/*
> + * Issue IBPB when switching guest vCPUs IFF switch_mm_always_ibpb.
> + * For the more common case of running VMs in their own dedicated process,
> + * switching vCPUs that belong to different VMs, i.e. switching tasks,
> + * will also switch mm_structs and thus do IPBP via cond_mitigation();
> + * however, in the always_ibpb case, take a paranoid approach and issue
> + * IBPB on both switch_mm() and vCPU switch.
> + */
> +static inline void x86_virt_guest_switch_ibpb(void)
> +{
> + if (static_branch_unlikely(&switch_mm_always_ibpb))
> + indirect_branch_prediction_barrier();
> +}
> +
> #endif
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 6296e1ebed1d..6aafb0279cbc 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -68,8 +68,12 @@ u64 __ro_after_init x86_amd_ls_cfg_ssbd_mask;
> DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp);
> /* Control conditional IBPB in switch_mm() */
> DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
> -/* Control unconditional IBPB in switch_mm() */
> +/* Control unconditional IBPB in both switch_mm() and
> + * x86_virt_guest_switch_ibpb().
> + * See notes on x86_virt_guest_switch_ibpb() for KVM use case details.
> + */
> DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
> +EXPORT_SYMBOL_GPL(switch_mm_always_ibpb);
>
> /* Control MDS CPU buffer clear before returning to user space */
> DEFINE_STATIC_KEY_FALSE(mds_user_clear);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index bd4c64b362d2..fc08c94df888 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1302,7 +1302,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
> if (sd->current_vmcb != svm->vmcb) {
> sd->current_vmcb = svm->vmcb;
> - indirect_branch_prediction_barrier();
> + x86_virt_guest_switch_ibpb();
> }
> if (kvm_vcpu_apicv_active(vcpu))
> __avic_vcpu_load(vcpu, cpu);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 04d170c4b61e..a8eed9b8221b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1270,7 +1270,7 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> * The L1 VMM can protect itself with retpolines, IBPB or IBRS.
> */
> if (!buddy || WARN_ON_ONCE(buddy->vmcs != prev))
> - indirect_branch_prediction_barrier();
> + x86_virt_guest_switch_ibpb();
> }
>
> if (!already_loaded) {
> --
> 2.30.1 (Apple Git-130)
>