Re: [PATCH 14/22] x86/fpu: Eager switch PKRU state

From: Sebastian Andrzej Siewior
Date: Fri Mar 08 2019 - 13:08:58 EST


On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote:
> On 2/21/19 3:50 AM, Sebastian Andrzej Siewior wrote:
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 67e4805bccb6f..05f6fce62e9f1 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -562,8 +562,24 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> > */
> > static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
> > {
> > - if (static_cpu_has(X86_FEATURE_FPU))
> > - __fpregs_load_activate(new_fpu, cpu);
> > + struct pkru_state *pk;
> > + u32 pkru_val = 0;
> > +
> > + if (!static_cpu_has(X86_FEATURE_FPU))
> > + return;
> > +
> > + __fpregs_load_activate(new_fpu, cpu);
>
> This is still a bit light on comments.
>
> Maybe:
> /* PKRU state is switched eagerly because... */

okay, will update.

> > + if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
> > + return;
> > +
> > + if (current->mm) {
> > + pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
> > + WARN_ON_ONCE(!pk);
>
> This can trip on us of the 'init optimization' is in play because
> get_xsave_addr() checks xsave->header.xfeatures. That's unlikely today
> because we usually set PKRU to a restrictive value. But, it's also not
> *guaranteed*.
>
> Userspace could easily do an XRSTOR that puts PKRU back in its init
> state if it wanted to, then this would end up with pk==NULL.
>
> We might actually want a selftest that *does* that. I don't think we
> have one.

So you are saying that the above warning might trigger and be "okay"?
My understanding is that the in-kernel XSAVE will always save everything
so we should never "lose" the XFEATURE_PKRU no matter what user space
does.

So as test case you want
xsave (-1 & ~XFEATURE_PKRU)
xrestore (-1 & ~XFEATURE_PKRU)

in userland and then a context switch to see if the warning above
triggers?

> > + if (pk)
> > + pkru_val = pk->pkru;
> > + }> + __write_pkru(pkru_val);
> > }
>
> A comment above __write_pkru() would be nice to say that it only
> actually does the slow instruction on changes to the value.

Could we please not do this? It is a comment above one of the callers
function and we have two or three. And we have that comment already
within __write_pkru().

> BTW, this has the implicit behavior of always trying to do a
> __write_pkru(0) on switches to kernel threads. That seems a bit weird
> and it is likely to impose WRPKRU overhead on switches between user and
> kernel threads.
>
> The 0 value is also the most permissive, which is not great considering
> that user mm's can be active the in page tables when running kernel
> threads if we're being lazy.
>
> Seems like we should either leave PKRU alone or have 'init_pkru_value'
> be the default. That gives good security properties and is likely to
> match the application value, removing the WRPKRU overhead.

Last time we talked about this we agreed (or this was my impression) that
0 should be written so that the kernel thread should always be able to
write to user space in case it borrowed its mm (otherwise it has none
and it would fail anyway).
We didn't want to leave PKRU alone because the outcome (whether or not
the write by the kernel thread succeeds) should not depend on the last
running task (and be random) but deterministic.

I am personally open to each outcome you decide :) I you want to use
`init_pkru_value' instead of 0 then I can change this. If you want to
skip the possible update for kernel threads then okay but maybe this
should be documented somehow.

Sebastian