Re: [PATCH] x86/cpu: Fix migration safety with X86_BUG_NULL_SEL

From: Borislav Petkov
Date: Fri Oct 01 2021 - 10:20:06 EST


On Fri, Oct 01, 2021 at 02:33:49PM +0100, Jane Malalane wrote:
> Subject: Re: [PATCH] x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
> ...
> Currently, Linux probes for X86_BUG_NULL_SEL unconditionally which
> makes it unsafe to migrate in a virtualised environment as the
> properties across the migration pool might differ.

Sorry but you need to explain "migration safety" in greater detail -
we're not all virtualizers.

> Zen3 adds the NullSelectorClearsBase bit to indicate that loading
> a NULL segment selector zeroes the base and limit fields, as well as
> just attributes. Zen2 also has this behaviour but doesn't have the
> NSCB bit.

I'm guessing you can detect Zen2 too, without the CPUID bit?

> When virtualised, NSCB might be cleared for migration safety,
> therefore we must not probe. Always honour the NSCB bit in this case,

Who is "we"?

> as the hypervisor is expected to synthesize it in the Zen2 case.
>
> Signed-off-by: Jane Malalane <jane.malalane@xxxxxxxxxx>
> ---
> CC: <x86@xxxxxxxxxx>
> CC: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxxxxx>
> CC: Borislav Petkov <bp@xxxxxxxxx>
> CC: "H. Peter Anvin" <hpa@xxxxxxxxx>
> CC: Pu Wen <puwen@xxxxxxxx>
> CC: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> CC: Sean Christopherson <seanjc@xxxxxxxxxx>
> CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> CC: Yazen Ghannam <Yazen.Ghannam@xxxxxxx>
> CC: Brijesh Singh <brijesh.singh@xxxxxxx>
> CC: Huang Rui <ray.huang@xxxxxxx>
> CC: Andy Lutomirski <luto@xxxxxxxxxx>
> CC: Kim Phillips <kim.phillips@xxxxxxx>
> CC: <stable@xxxxxxxxxxxxxxx>
> ---
> arch/x86/include/asm/cpufeatures.h | 2 +-
> arch/x86/kernel/cpu/amd.c | 23 +++++++++++++++++++++++
> arch/x86/kernel/cpu/common.c | 6 ++----
> arch/x86/kernel/cpu/cpu.h | 1 +
> arch/x86/kernel/cpu/hygon.c | 23 +++++++++++++++++++++++
> 5 files changed, 50 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d0ce5cfd3ac1..f571e4f6fe83 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -96,7 +96,7 @@
> #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
> #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
> #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
> -/* FREE! ( 3*32+17) */
> +#define X86_FEATURE_NSCB ( 3*32+17) /* Null Selector Clears Base */

You don't really need that bit definition - you check CPUID in the
early_init_amd/hygon() functions and that is enough. IOW, based on that
CPUID bit, you can call detect_null_seg_behavior() and set (or not)
X86_BUG_NULL_SEG.

...

> diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
> index 6d50136f7ab9..765f1556d964 100644
> --- a/arch/x86/kernel/cpu/hygon.c
> +++ b/arch/x86/kernel/cpu/hygon.c
> @@ -264,6 +264,29 @@ static void early_init_hygon(struct cpuinfo_x86 *c)
> if (c->x86_power & BIT(14))
> set_cpu_cap(c, X86_FEATURE_RAPL);
>
> + /*
> + * Zen1 and earlier CPUs don't clear segment base/limits when
> + * loading a NULL selector. This has been designated
> + * X86_BUG_NULL_SEG.
> + *
> + * Zen3 CPUs advertise Null Selector Clears Base in CPUID.
> + * Zen2 CPUs also have this behaviour, but no CPUID bit.
> + *
> + * A hypervisor may sythesize the bit, but may also hide it
> + * for migration safety, so we must not probe for model
> + * specific behaviour when virtualised.
> + */
> + if (c->extended_cpuid_level >= 0x80000021 &&
> + cpuid_eax(0x80000021) & BIT(6))
> + set_cpu_cap(c, X86_FEATURE_NSCB);
> +
> + if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_NSCB) &&
> + c->x86 == 0x18)
> + detect_null_seg_behavior(c);
> +
> + if (!cpu_has(c, X86_FEATURE_NSCB))
> + set_cpu_bug(c, X86_BUG_NULL_SEG);

Please integrate scripts/checkpatch.pl into your patch creation
workflow. Some of the warnings/errors *actually* make sense.

I'll show you that there's whitespace damage here:

ERROR: code indent should use tabs where possible
#238: FILE: arch/x86/kernel/cpu/hygon.c:281:
+^I set_cpu_cap(c, X86_FEATURE_NSCB);$

ERROR: code indent should use tabs where possible
#242: FILE: arch/x86/kernel/cpu/hygon.c:285:
+ detect_null_seg_behavior(c);$

WARNING: please, no spaces at the start of a line
#242: FILE: arch/x86/kernel/cpu/hygon.c:285:
+ detect_null_seg_behavior(c);$

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette