Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

From: Jason A. Donenfeld
Date: Fri Jun 15 2018 - 16:31:39 EST


On Fri, Jun 15, 2018 at 9:34 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> Didn't we recently do a bunch of crypto patches to help with this?
>
> I think they had the pattern:
>
> kernel_fpu_begin();
> for (units-of-work) {
> do_unit_of_work();
> if (need_resched()) {
> kernel_fpu_end();
> cond_resched();
> kernel_fpu_begin();
> }
> }
> kernel_fpu_end();

Right, so that's the thing -- this is an optimization easily available
to individual crypto primitives. But I'm interested in applying this
kind of optimization to an entire queue of, say, tiny packets, where
each packet is processed individually. Or, to a cryptographic
construction, where several different primitives are used, such that
it'd be meaningful not to have to get the performance hit of
end()begin() in between each and everyone of them.