Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

From: Rik van Riel
Date: Fri Jun 15 2018 - 14:40:11 EST


On Fri, 2018-06-15 at 11:31 -0700, Andy Lutomirski wrote:
> On Fri, Jun 15, 2018 at 6:11 AM Jason A. Donenfeld <Jason@xxxxxxxxx>
> wrote:
> >
> > Hi Andy & folks,
> >
> > Lots of crypto routines look like this:
> >
> > kernel_fpu_begin();
> > encrypt();
> > kernel_fpu_end();
> >
> > If you call such a routine twice, you get:
> >
> > kernel_fpu_begin();
> > encrypt();
> > kernel_fpu_end();
> > kernel_fpu_begin();
> > encrypt();
> > kernel_fpu_end();
> >
> > In a loop this looks like:
> >
> > for (thing) {
> > kernel_fpu_begin();
> > encrypt(thing);
> > kernel_fpu_end();
> > }
> >
> > This is obviously very bad, because begin() and end() are slow, so
> > WireGuard does the obvious:
> >
> > kernel_fpu_begin();
> > for (thing)
> > encrypt(thing);
> > kernel_fpu_end();
> >
> > This is fine and well, and the crypto API I'm working on will
> > enable
> > this to be done in a clear way, but I do wonder if maybe this is
> > not
> > something that should be happening at the level of the caller, but
> > rather in the fpu functions themselves. Namely, what are your
> > thoughts
> > on modifying kernel_fpu_end() so that it doesn't actually restore
> > the
> > state, but just reenables preemption and marks that on the next
> > context switch, the state should be restored? Then, essentially,
> > kernel_fpu_begin() and end() become free after the first usage of
> > kernel_fpu_begin().
> >
> > Is this something feasible? I know that performance-wise, I'm
> > really
> > gaining a lot from hoisting those functions out of the loops, and
> > API
> > wise, it'd be slightly simpler to implement if I didn't have to all
> > for that hoisting.
>
> Hi Jason-
>
> Funny you asked. This has been discussed a few times, although not
> quite in the form you imagined. The idea that we've tossed around is
> to restore FPU state on return to user mode. Roughly, we'd introduce
> a new thread flag TIF_FPU_UNLOADED (name TBD).
> prepare_exit_to_usermode() would notice this flag, copy the fpstate
> to
> fpregs, and clear the flag. (Or maybe exit_to_usermode_loop() -- No
> one has quite thought it through, but I think it should be outside
> the
> loop.) We'd update all the FPU accessors to understand the flag.
> We'd probably have a helper like:
>
> void unload_user_fpstate(void)
>
> that would do nothing if TIF_FPU_UNLOADED is set. If
> TIF_FPU_UNLOADED
> is clear, it would move the state to memory and set TIF_FPU_UNLOADED.
> We'd call unload_user_fpstate() during context switch in the prev
> task
> context and in kernel_fpu_begin().
>
> The one major complication I know of is that PKRU is different from
> all other FPU state in that it needs to be loaded when *kernel* code
> does any user memory access. So we'd need to make sure that context
> switches load the new task's PKRU eagerly. Using WRPKRU is easy,
> but,
> unless we do something very clever, actually finding PKRU in the
> in-memory fpstate image may be slightly nontrivial.

IIRC the in-kernel FPU state always has every FPU field
at a constant location.

> I suppose we could eagerly load the new FPU state on context
> switches,
> but I'm not sure I see any point. We'd still have to load it when we
> switch back to user mode, so it would be slower but not necessarily
> any simpler.
>
> I'd love to review patches to make this change, but I don't have the
> bandwidth to write them right now.

I can take a look at this, I am already looking at
some context switch code right now, anyway.

I also should still have the TIF_FPU_UNLOADED patches
(or whatever the flag was called) code around somewhere.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part