Re: [PATCH v3 9/9] x86: jump-labels: use macros instead of inline assembly

From: Nadav Amit
Date: Sun Jun 10 2018 - 23:57:08 EST


at 6:29 PM, hpa@xxxxxxxxx wrote:

> On June 10, 2018 7:19:11 AM PDT, Nadav Amit <namit@xxxxxxxxxx> wrote:
>> Use assembly macros for jump-labels and call them from inline assembly.
>> This not only makes the code more readable, but also improves
>> compilation decision, specifically inline decisions which GCC base on
>> the number of new lines in inline assembly.
>>
>> As a result the code size is slightly increased.
>>
>> text data bss dec hex filename
>> 18163528 10226300 2957312 31347140 1de51c4 ./vmlinux before
>> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128)
>>
>> And functions such as intel_pstate_adjust_policy_max(),
>> kvm_cpu_accept_dm_intr(), kvm_register_read() are inlined.
>>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>> Cc: x86@xxxxxxxxxx
>> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>> Cc: Kate Stewart <kstewart@xxxxxxxxxxxxxxxxxxx>
>> Cc: Philippe Ombredanne <pombredanne@xxxxxxxx>
>>
>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
>> ---
>> arch/x86/include/asm/jump_label.h | 65 ++++++++++++++++++-------------
>> arch/x86/kernel/macros.S | 1 +
>> 2 files changed, 39 insertions(+), 27 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/jump_label.h
>> b/arch/x86/include/asm/jump_label.h
>> index 8c0de4282659..ea0633a41122 100644
>> --- a/arch/x86/include/asm/jump_label.h
>> +++ b/arch/x86/include/asm/jump_label.h
>> @@ -2,19 +2,6 @@
>> #ifndef _ASM_X86_JUMP_LABEL_H
>> #define _ASM_X86_JUMP_LABEL_H
>>
>> -#ifndef HAVE_JUMP_LABEL
>> -/*
>> - * For better or for worse, if jump labels (the gcc extension) are
>> missing,
>> - * then the entire static branch patching infrastructure is compiled
>> out.
>> - * If that happens, the code in here will malfunction. Raise a
>> compiler
>> - * error instead.
>> - *
>> - * In theory, jump labels and the static branch patching
>> infrastructure
>> - * could be decoupled to fix this.
>> - */
>> -#error asm/jump_label.h included on a non-jump-label kernel
>> -#endif
>> -
>> #define JUMP_LABEL_NOP_SIZE 5
>>
>> #ifdef CONFIG_X86_64
>> @@ -28,18 +15,27 @@
>>
>> #ifndef __ASSEMBLY__
>>
>> +#ifndef HAVE_JUMP_LABEL
>> +/*
>> + * For better or for worse, if jump labels (the gcc extension) are
>> missing,
>> + * then the entire static branch patching infrastructure is compiled
>> out.
>> + * If that happens, the code in here will malfunction. Raise a
>> compiler
>> + * error instead.
>> + *
>> + * In theory, jump labels and the static branch patching
>> infrastructure
>> + * could be decoupled to fix this.
>> + */
>> +#error asm/jump_label.h included on a non-jump-label kernel
>> +#endif
>> +
>> #include <linux/stringify.h>
>> #include <linux/types.h>
>>
>> static __always_inline bool arch_static_branch(struct static_key *key,
>> bool branch)
>> {
>> - asm_volatile_goto("1:"
>> - ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
>> - ".pushsection __jump_table, \"aw\" \n\t"
>> - _ASM_ALIGN "\n\t"
>> - _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
>> - ".popsection \n\t"
>> - : : "i" (key), "i" (branch) : : l_yes);
>> + asm_volatile_goto("STATIC_BRANCH_GOTO l_yes=\"%l[l_yes]\" key=\"%c0\"
>> "
>> + "branch=\"%c1\""
>> + : : "i" (key), "i" (branch) : : l_yes);
>>
>> return false;
>> l_yes:
>> @@ -48,13 +44,8 @@ static __always_inline bool
>> arch_static_branch(struct static_key *key, bool bran
>>
>> static __always_inline bool arch_static_branch_jump(struct static_key
>> *key, bool branch)
>> {
>> - asm_volatile_goto("1:"
>> - ".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
>> - "2:\n\t"
>> - ".pushsection __jump_table, \"aw\" \n\t"
>> - _ASM_ALIGN "\n\t"
>> - _ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
>> - ".popsection \n\t"
>> + asm_volatile_goto("STATIC_BRANCH_JUMP_GOTO l_yes=\"%l[l_yes]\"
>> key=\"%c0\" "
>> + "branch=\"%c1\""
>> : : "i" (key), "i" (branch) : : l_yes);
>>
>> return false;
>> @@ -108,6 +99,26 @@ struct jump_entry {
>> .popsection
>> .endm
>>
>> +.macro STATIC_BRANCH_GOTO l_yes:req key:req branch:req
>> +1:
>> + .byte STATIC_KEY_INIT_NOP
>> + .pushsection __jump_table, "aw"
>> + _ASM_ALIGN
>> + _ASM_PTR 1b, \l_yes, \key + \branch
>> + .popsection
>> +.endm
>> +
>> +.macro STATIC_BRANCH_JUMP_GOTO l_yes:req key:req branch:req
>> +1:
>> + .byte 0xe9
>> + .long \l_yes - 2f
>> +2:
>> + .pushsection __jump_table, "aw"
>> + _ASM_ALIGN
>> + _ASM_PTR 1b, \l_yes, \key + \branch
>> + .popsection
>> +.endm
>> +
>> #endif /* __ASSEMBLY__ */
>>
>> #endif
>> diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
>> index bf8b9c93e255..161c95059044 100644
>> --- a/arch/x86/kernel/macros.S
>> +++ b/arch/x86/kernel/macros.S
>> @@ -13,3 +13,4 @@
>> #include <asm/paravirt.h>
>> #include <asm/asm.h>
>> #include <asm/cpufeature.h>
>> +#include <asm/jump_label.h>
>
> If the code size increases, do you have any metrics for improvement?
>
> That being said, I do like the readability improvements and the ability to
> use gas macros in assembly as opposed to cpp macros wrapped in different
> ways for C and assembly.

I didnât try to measure the performance impact of each patch independently.
I measured the paravirt patch impact, since it was relatively easy to see
that before the change a lot of page-table manipulation functions were not
inlined, and indeed I got 2% performance improvement for a microbenchmark
of repeated page-fault + MADV_DONTNEED.

For this patch, it might be slightly hard to see an effect on performance.
However, functions of kvm, btrfs, ext4, xen, and xfs and other are affected.

Sampling some changes shows that the compiler makes reasonable decisions
that can increase the code size. For example, with this patch
kvm_pmi_trigger_fn() gets kvm_pmu_deliver_pmi() inlined. Since
kvm_pmu_deliver_pmi() is used in a different object (x86.o), a non-inlined
instance of the function is produced as well, which causes the binary size
to (slightly) increase.

I think that the value of this series is in the overall impact. All it does
is prevent the compiler from making wrong decisions. This series can
presumably prevent __always_inline annotations such as in a recent patch
[1], in which the encountered issue, I suspect was caused by the use of
static_cpu_has().

Regards,
Nadav

[1] https://patchwork.kernel.org/patch/10450037/