Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

From: Josh Poimboeuf
Date: Thu Sep 21 2017 - 12:18:19 EST


On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote:
>
> * Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > > For inline asm statements which have a CALL instruction, we list the
> > > > stack pointer as a constraint to convince GCC to ensure the frame
> > > > pointer is set up first:
> > > >
> > > > static inline void foo()
> > > > {
> > > > register void *__sp asm(_ASM_SP);
> > > > asm("call bar" : "+r" (__sp))
> > > > }
> > > >
> > > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > >
> > > > There's actually an easier way to achieve the same goal in GCC, without
> > > > causing trouble for clang. If we declare the stack pointer register
> > > > variable as a global variable, and remove the constraint altogether,
> > > > that convinces GCC to always set up the frame pointer before inserting
> > > > *any* inline asm.
> > > >
> > > > It basically acts as if *every* inline asm statement has a CALL
> > > > instruction. It's a bit overkill, but the performance impact should be
> > > > negligible.
> > > >
> > >
> > > Again, probably negligible, but why do we need a frame pointer just
> > > because we have a call assembly instruction?
> >
> > It's frame pointer convention. Without it, if dumping the stack from
> > the called function, a function will get skipped in the stack trace.
>
> BTW., could we perhaps relax this and simply phase out the frame pointer on x86,
> and simplify all our assembly in a cycle or two? ORC unwinder is working out very
> well so far. Live kernel patching can use ORC data just fine, and nothing else
> actually relies on frame pointers, right?
>
> That would give one more register to assembly code.
>
> I realize that we just rewrote a whole bunch of assembly code... but that was the
> price for ORC, in a way.

I think ORC isn't *quite* ready for livepatch yet, though it's close.
We could probably make it ready in a cycle or two.

However, I'm not sure whether we would want to remove livepatch support
for frame pointers yet:

- There might be some embedded livepatch users who can't deal with the
extra 3MB of RAM needed by ORC. (Though this is purely theoretical, I
don't actually know anybody with this problem. I suppose we could
float the idea on the livepatch/kpatch mailing lists and see if
anybody objects.)

- Objtool's ORC generation is working fine now, but objtool maintenance
is currently a little heavy-handed, and I'm currently the only person
who knows how to do it. I've got an idea brewing for how to improve
its maintainability with the help of compiler plugins, but until then,
if I get hit by a bus tomorrow, who's going to pick it up? It's nice
to have frame pointers as a backup solution for live patching.

But also, is this a problem that's even worth fixing? Do we have any
assembly code that would be noticeably better off with that extra
register? Most of the changes were:

- adding FRAME_BEGIN/FRAME_END to some asm functions;
- juggling register usage in a few crypto functions; and
- adding the stack pointer constraint to 15 or so inline asm functions.

I think those changes weren't all that disruptive to start with.

--
Josh