Re: [patch 08/41] x86/fpu: Restrict fpstate sanitizing to legacy components

From: Thomas Gleixner
Date: Sat Jun 12 2021 - 18:06:44 EST


On Sat, Jun 12 2021 at 00:12, Thomas Gleixner wrote:
> On Fri, Jun 11 2021 at 12:03, Andy Lutomirski wrote:
>> On 6/11/21 9:15 AM, Thomas Gleixner wrote:
>>> + *
>>> + * This is required for the legacy regset functions.
>>> + */
>>> +static void fpstate_sanitize_legacy(struct fpu *fpu)
>>> +{
>>> + struct fxregs_state *fx = &fpu->state.fxsave;
>>> + u64 xfeatures;
>>> +
>>> + if (!use_xsaveopt())
>>> + return;
>>
>> This is confusing, since we never use xsaveopt. It's also wrong -- see
>> above. How about just removing it?
>
> We do and this code is buggy because xsaves does not clear the component
> storage either. Neither does xsavec which we in fact do not use in the
> kernel.
>
> So here is how the different opcodes behave on a buffer filled with 0xff
> when looking the first four 64bit words of the buffer after doing a
> xrstor with all feature bits cleared
>
> Intel SKLX
>
> XSAVE 000000000000037f 0000000000000000 0000000000000000 0000ffff00001f80
> XSAVEOPT ffffffffffffffff ffffffffffffffff ffffffffffffffff 0000ffff00001f80
> XSAVEC ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> XSAVES ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
>
> AMD ZEN2
>
> XSAVE 000000000000037f 0000000000000000 0000000000000000 0002ffff00001f80
> XSAVEOPT ffffffffffffffff ffffffffffffffff ffffffffffffffff 0002ffff00001f80
> XSAVEC ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> XSAVES ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
>
> I verified that all saved buffers have xstate.header.xstate_bv == 0
>
> So nothing about any of this is consistent and correct. But it magically
> works for unknown reasons.

What's even worse is the following. setup_init_fpu_buf() does:

copy_kernel_to_xregs_booting(&init_fpstate.xsave);
/*
* Dump the init state again. This is to identify the init state
* of any feature which is not represented by all zero's.
*/
copy_xregs_to_kernel_booting(&init_fpstate.xsave);

That comment is blatantly wrong with XSAVES/XRSTORS. init_fpstate is
initially all zeros and it stays that way with XRSTORS. Oh well.

And as the intialization values at least for mxcsr_mask differ on AMD
and INTEL making them hardcoded is just wrong. Sigh...

So we could save the state with XSAVE into a different buffer and copy
the components to the correct place in the compacted init_fpstate, but
the only interesting part of all this with any variant of XSAVE is the
legacy part. Everything else is just zeros (except for HDC and HWP
states which we do not use and they can be accessed via MSRs if the need
ever arises). We could just save the legacy part with fxsave and be done
with it.

Aside of that this whole _booting() stuff is complete nonsense. It's
completely sufficient to XRSTOR from an all zeroes buffer which brings
every components into init state and then do all the other muck _after_
alternatives have been patched. Absolutely nothing uses FPU muck before
that point.

While staring at all this I figured out why that sanitizing does not and
_cannot_ touch MXCSR and MXCSR_MASK.

MXCSR and MXCSR_MASK are located in the FP component storage and used by
SSE. But MXCSR is also XSAVEd when AVX is in use.

So the sanitizing cannot touch it without checking whether AVX is in
use. This is really all well thought out in hardware _AND_ software.

Let me try to beat some more sense into this trainwreck.

Thanks,

tglx