Re: [PATCH v10 13/28] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state

From: Thomas Gleixner
Date: Mon Oct 04 2021 - 15:04:05 EST


On Sun, Oct 03 2021 at 22:39, Chang Seok Bae wrote:
> On Oct 1, 2021, at 13:20, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Looking at the changelog of the patch to delay XSTATE [1] load:
>
> This gives the kernel the potential to skip loading FPU state for tasks
> that stay in kernel mode, or for tasks that end up with repeated
> invocations of kernel_fpu_begin() & kernel_fpu_end().

Correct.

> But I think XFD state is different from XSTATE. There is no use case for
> XFD-enabled features in kernel mode.

Correct, but your patch does not ensure that XFD features are disabled
on context switch. You write the XFD mask of the next task when it
differs frome the XFD mask of the previous task. So we have the following:

prev XFD next XFD
DISABLED DISABLED XFD features stay disabled
ENABLED DISABLED XFD features are disabled
DISABLED ENABLED XFD features are enabled
ENABLED ENABLED XFD features stay enabled

So it still runs in the kernel with XFD features enabled including
interrupts, soft interupts, exceptions and NMIs. So what's the problem
when it does a user -> kernel -> user transition with the user XFD on?

> So, XFD state was considered to be switched under switch_to() just
> like other user states. E.g. user FSBASE is switched here as kernel
> does not use it.

That's not really a justification.

> But user GSBASE is loaded at returning to userspace.

And so is XSTATE

> Potentially, it is also beneficial as XFD-armed states will hold
> INIT-state [3]:
>
> If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i, the
> instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1; instead,
> it saves bit i of XSTATE_BV field of the XSAVE header as 0 (indicating
> that the state component is in its initialized state).

How does that matter? The point is that if the FPU registers are
unmodified then a task can return to user space without doing anything
even if it went through five context switches. So how is XFD any
different?

Where is the kernel doing XSAVE / XSAVES:

1) On context switch which sets TIF_NEED_FPU_LOAD

Once TIF_NEED_FPU_LOAD is set the kernel does not do XSAVES in
the context of the task simply because it knows that the content
is in the memory buffer.

2) In signal handling

Only happens when TIF_NEED_FPU_LOAD == 0

Where is the kernel doing XRSTOR / XRSTORS:

1) On return to user space if the FPU registers are not up to date

So this can restore XFD as well

2) In signal handling and related functions

Only happens when TIF_NEED_FPU_LOAD == 0

So what's the win?

No wrmsrl() on context switch, which means for the user -> kthread ->
user context switch scenario for which the register preserving is
optimized you spare two wrmsrl() invocations, run less code with less
conditionals.

What's the price?

A few trivial XFD sanity checks for debug enabled kernels to ensure that
XFD is correct on XSAVE[S] and XRSTOR[S], which have no runtime overhead
on production systems.

Even if we decide that these checks should be permanent then they happen
in code pathes which are doing a slow X* operation anyway.

Thanks,

tglx