Re: [PATCH RFC] x86: enforce inlining for atomics

From: Hagen Paul Pfeifer
Date: Tue Apr 21 2015 - 18:57:35 EST


* Ingo Molnar | 2015-04-21 09:42:12 [+0200]:

Hey Ingo,

>So the thing is that allyesconfig turns on -Os:
>
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y

CONFIG_CC_OPTIMIZE_FOR_SIZE seems to have no effect, The only option which
makes a difference is CONFIG_OPTIMIZE_INLINING! But this is not a big surprise:
*disabling* CONFIG_OPTIMIZE_INLINING substitudes _all_ inlines with
__attribute__((always_inline)).

"If unsure, say N." -> results in configurations with always_inline.


So I tested again, one time with unset CONFIG_OPTIMIZE_INLINING the result
seems fine:

show_temp: 59 duplicates
char2uni: 52 duplicates
uni2char: 52 duplicates
sd_probe: 49 duplicates
sd_driver_init: 48 duplicates
sd_driver_exit: 48 duplicates
usb_serial_module_exit: 47 duplicates
[...]


We see ordinary "template" reuse of common driver code without renaming the
copied static's. But compiled with CONFIG_OPTIMIZE_INLINING=y the inlining is
not respected by gcc:

atomic_inc: 544 duplicates
rcu_read_unlock: 453 duplicates
rcu_read_lock: 383 duplicates
get_dma_ops: 271 duplicates
arch_local_irq_restore: 258 duplicates
atomic_dec: 215 duplicates
kzalloc: 185 duplicates
test_and_set_bit: 156 duplicates
cpumask_check: 148 duplicates
cpumask_next: 146 duplicates
list_del: 131 duplicates
kref_get: 126 duplicates
test_and_clear_bit: 122 duplicates
brelse: 122 duplicates
schedule_work: 122 duplicates
netif_tx_stop_queue: 115 duplicates
atomic_dec_and_test: 107 duplicates
dma_mapping_error: 105 duplicates
list_del_init: 101 duplicates
netif_stop_queue: 100 duplicates
arch_local_save_flags: 98 duplicates
tasklet_schedule: 76 duplicates
clk_prepare_enable: 71 duplicates
init_completion: 69 duplicates
pskb_may_pull: 67 duplicates
[...]

Again, the used gcc version is "gcc (Debian 4.9.2-10) 4.9.2". So it is not
outdated nor a legacy one. The inline heuristic seems really broken for some
parts. Is it possible that gcc is bedeviled because of inline assembler
parts which brings confuse the internal scoring system?

I suggest the following: I prepare a patch series for the most obvious
candidates and substituting inline with __always_inline (probably ~50
functions). Each subsystem maintainer can check and ACK the patch. This has the
benefit that for all other locations gcc is still responsible for inlining
decision. Enforcing inlining via __always_inline for all inline marked function
is probably too hard!? In 2015 gcc is still not able to inline single line
statements - that's strange.

Linus, ack?

Hagen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/