Re: [PATCH 4/6] Unsuck "x86/entry/64: Create a percpu SYSCALL entry trampoline"

From: Josh Poimboeuf
Date: Sat Dec 02 2017 - 10:18:51 EST


On Thu, Nov 30, 2017 at 10:29:44PM -0800, Andy Lutomirski wrote:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index caf74a1bb3de..28f4e7553c26 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -180,14 +180,24 @@ ENTRY(entry_SYSCALL_64_trampoline)
>
> /*
> * x86 lacks a near absolute jump, and we can't jump to the real
> - * entry text with a relative jump, so we fake it using retq.
> + * entry text with a relative jump. We could push the target
> + * address and then use retq, but this destroys the pipeline on
> + * many CPUs (wasting over 20 cycles on Sandy Bridge). Instead,
> + * spill RDI and restore it in a second-stage trampoline.
> */
> - pushq $entry_SYSCALL_64_after_hwframe
> - retq
> + pushq %rdi
> + movq $entry_SYSCALL_64_stage2, %rdi
> + jmp *%rdi
> END(entry_SYSCALL_64_trampoline)
>
> .popsection
>
> +ENTRY(entry_SYSCALL_64_stage2)
> + UNWIND_HINT_EMPTY
> + popq %rdi
> + jmp entry_SYSCALL_64_after_hwframe
> +END(entry_SYSCALL_64_stage2)
> +
> ENTRY(entry_SYSCALL_64)
> UNWIND_HINT_EMPTY
> /*

Another crazy idea:

call 1f
1: movq $entry_SYSCALL_64_after_hwframe, (%rsp)
ret

Does that fix the regression?

--
Josh