Re: [PATCH 00/33] Compile-time stack metadata validation

From: Peter Zijlstra
Date: Mon Feb 15 2016 - 11:50:23 EST


On Mon, Feb 15, 2016 at 10:31:34AM -0600, Josh Poimboeuf wrote:
> On Fri, Feb 12, 2016 at 09:10:11PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 12, 2016 at 12:32:06PM -0600, Josh Poimboeuf wrote:
> > > What I actually see in the listing is:
> > >
> > > decl __percpu_prefix:__preempt_count
> > > je 1f:
> > > ....
> > > 1:
> > > call ___preempt_schedule
> > >
> > > So it puts the "call ___preempt_schedule" in the slow path.
> >
> > Ah yes indeed. Same difference though.
> >
> > > I also don't see how that would be related to the use of the asm
> > > statement in the __preempt_schedule() macro. Doesn't the use of
> > > unlikely() in preempt_enable() put the call in the slow path?
> >
> > Sadly no, unlikely() and asm_goto don't work well together. But the slow
> > path or not isn't the reason we do the asm call thing.
> >
> > > #define preempt_enable() \
> > > do { \
> > > barrier(); \
> > > if (unlikely(preempt_count_dec_and_test())) \
> > > preempt_schedule(); \
> > > } while (0)
> > >
> > > Also, why is the thunk needed? Any reason why preempt_enable() can't be
> > > called directly from C?
> >
> > That would make the call-site save registers and increase the size of
> > every preempt_enable(). By using the thunk we can do callee saved
> > registers and avoid blowing up the call site.
>
> So is the goal to optimize for size?

General performance impact of preempt_enable().

> If I replace the calls to
> __preempt_schedule[_notrace]() with real C calls and remove the thunks,
> it only adds about 2k to vmlinux.

That's less than I had expected, but probably still worth it.

And is that added text purely in the slow path? We really want to avoid
putting any more register pressure on the preempt_enable() call sites.
The single memop and Jcc is about as fast we can get and we spend quite
a bit of effort getting there.

> There are two ways to fix the warnings:
>
> 1. get rid of the thunks and call the C functions directly; or
>
> 2. add the stack pointer to the asm() statement output operand list to
> ensure a stack frame gets created in the caller function before the
> call. (Note this still allows the thunks to do callee saved registers.)
>
> I like #1 better, but maybe I'm still missing the point of the thunks.

Ingo, Linus?