Re: [PATCH] x86/lib: don't use MMX before FPU initialization

From: Andy Lutomirski
Date: Thu Jan 14 2021 - 11:32:25 EST


On Thu, Jan 14, 2021 at 6:51 AM Krzysztof Mazur <krzysiek@xxxxxxxxxxxx> wrote:
>
> On Thu, Jan 14, 2021 at 03:07:37PM +0100, Borislav Petkov wrote:
> > On Thu, Jan 14, 2021 at 01:36:57PM +0100, Krzysztof Mazur wrote:
> > > The OSFXSR must be set only on CPUs with SSE. There
> > > are some CPUs with 3DNow!, but without SSE and FXSR (like AMD
> > > Geode LX, which is still used in many embedded systems).
> > > So, I've changed that to:
> > >
> > > if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
> > > unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))
> >
> > Why?
> >
> > X86_CR4_OSFXSR won't ever be set on those CPUs but the test will be
> > performed anyway. So there's no need for boot_cpu_has().
>
> Because the MMX version should be always used on those CPUs, even without
> OSFXSR set. If the CPU does not support SSE, it is safe to
> call kernel_fpu_begin() without OSFXSR set.
> "!(cr4_read_shadow() & X86_CR4_OSFXSR)" will be always true on
> those CPUs, and without boot_cpu_has() MMX version will be never used.
>
> There are two cases:
>
> 3DNow! without SSE always use MMX version
> 3DNow! + SSE (K7) use MMX version only if FXSR is enabled
>
> Thanks.
>
> Best regards,
> Krzysiek
> -- >8 --
> Subject: [PATCH] x86/lib: don't use mmx_memcpy() too early
>
> The MMX 3DNow! optimized memcpy() is used very early,
> even before FPU is initialized in the kernel. It worked fine, but commit
> 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR
> to default in kernel_fpu_begin()") broke that. After that
> commit the kernel_fpu_begin() assumes that FXSR is enabled in
> the CR4 register on all processors with SSE. Because memcpy() is used
> before FXSR is enabled, the kernel crashes just after "Booting the kernel."
> message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when
> some AMD/Cyrix processors are selected) on processors with SSE
> (like AMD K7, which supports both MMX 3DNow! and SSE).
>
> Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()")
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # 5.8+
> Signed-off-by: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx>
> ---
> arch/x86/lib/mmx_32.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c
> index 4321fa02e18d..70aa769570e6 100644
> --- a/arch/x86/lib/mmx_32.c
> +++ b/arch/x86/lib/mmx_32.c
> @@ -25,13 +25,20 @@
>
> #include <asm/fpu/api.h>
> #include <asm/asm.h>
> +#include <asm/tlbflush.h>
>
> void *_mmx_memcpy(void *to, const void *from, size_t len)
> {
> void *p;
> int i;
>
> - if (unlikely(in_interrupt()))
> + /*
> + * kernel_fpu_begin() assumes that FXSR is enabled on all processors
> + * with SSE. Thus, MMX-optimized version can't be used
> + * before the kernel enables FXSR (OSFXSR bit in the CR4 register).
> + */
> + if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) &&
> + unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR))))

This is gross. I realize this is only used for old CPUs that we don't
care about perf-wise, but this code is nonsense -- it makes absolutely
to sense to put this absurd condition here to work around
kernel_fpu_begin() bugs. If we want to use MMX, we should check MMX.
And we should also check the correct condition re: irqs. So this code
should be:

if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) {
kernel_fpu_begin_mask(FPU_MASK_XMM);
...

or we could improve code gen by adding a try_kernel_fpu_begin_mask()
so we can do a single call instead of two.

This also mostly fixes our little performance regression -- we'd make
kernel_fpu_begin() be an inline wrapper around
kernel_fpu_begin_mask(FPU_MASK_DEFAULTFP), and *that* would be
FPU_MASK_XYZMM on 64-bit and FPU_MASK_387 on 32-bit. (Okay, XYZMM
isn't a great name, but whatever.) And the EFI code can become
completely explicit: kernel_fpu_begin(FPU_MASK_ALL).

What do you all think? If you're generally in favor, I can write the
code and do a quick audit for other weird cases in the kernel.

Or we could flip the OSFSXR bit very early, I suppose.

--Andy