Automated search for inline bloat

From: Denys Vlasenko
Date: Sun Jul 12 2015 - 10:05:55 EST


The scripts are in attached tarball.

Is there interest in putting them in, say, scripts/inline_bloat/
in the kernel tree?

============================================
Inline hunting.

There are outrageously big inlines in kernel. Finding them by hand
is inefficient, if you want to handle the worst of them first.

Let's automate it.

Inline hunting in header files is done by replacing "inline" with "noinline",
rebuilding vmlinux, and seeing how much smaller it becomes.

(The accounting is more complex than simple comparison of "size vmlinux"
before and after, since deinlined function body is repeated in each
object file it is called from).


HOWTO.

* Find a machine with lots of CPUs.
* Start with an empty work directory.
* Copy these scripts into it.
* git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
* cd linux && make oldconfig && make -j99 && cp vmlinux vmlinux.original
* Edit conf.py - this is crucial to optimally parallelize the next step,
which consumes most of the time (usually a day or more).
* Run ./1start. Now parallel build jobs are running in linux.N/ dirs.
N = conf.py::dir_count.
* Run ./2merge_and_sort_results:
...
Total inlines to measure:22640, processed:15883
Total inlines to measure:22640, processed:15889 (after 60 seconds)
...
(This script generates the final result, inlines.log.measured.sorted,
every minute. When all inlines are processed, it will stop)

Result files are:

inlines_err.log - error messages from inline finding script
linux.N/inlines_err.log - each parallel job's stderr
inlines.log - all inlines
inlines.log.sorted - sorted by line count
inlines.log.filtered - selected for measurement (via conf.py::min_lines)
inlines.log.measured - measured code size change
inlines.log.measured.sorted - sorted, format is:

drivers/gpu/drm/radeon/radeon.h:2696:radeon_ring_write:7:46196:73
filename:lineno^^^^^^^^^^^^^^^^^^^^^|inline_name^^^^^^|^|saved|size_of_deinlined_fn
!lines_of_source_code
IOW:
Deinlining radeon_ring_write() shrinks kernel by 46196 bytes.
Deinlined function body is 73 bytes of code.


Which config to use?

Well, we want to look at *all* code, so allyesconfig is a natural choice.

However, some options are clearly "heavy debugging" stuff.
IOW: many developers run their work machines with lock debugging and such,
but only few would *constantly* use something which slows kernel down by
a factor of 3. We don't care if inlined function is "too big" only when
this sort of config option is in effect.

So, CONFIG_KASAN is off.

CONFIG_STAGING is also off: we probably don't bother covering
semi-broken drivers.

CONFIG_CC_OPTIMIZE_FOR_SIZE=y (iow: -Os build) is useful because it eliminates
code padding and thus reduces random jitter in code sizes. However, gcc has
a problem where it spuriously deinlines functions marked "inline", and it's
much more pronounced wit -Os:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

Therefore, CONFIG_OPTIMIZE_INLINING needs to be disabled with -Os.
It's a good idea anyway: when it's off, build system turns "inline"
into "__always_inline", making deinlining code size change much more robust.
And we do want to find bad inlines - for that, we should prevent gcc
from hiding them!

Nevertheless. CONFIG_OPTIMIZE_INLINING=y build is also useful:
after it,

nm --size-sort vmlinux \
| grep -iF ' t ' \
| uniq -c | grep -v '^ *1 ' | sort -rn

nicely shows duplicate same-sized functions, most of them are bogus
deinlines by gcc.
If you plan to create a patch which forces their inlining (or you work
on fixing gcc!),
this list is useful.

CONFIG_CMDLINE_BOOL should be disabled, otherwise test boots in e.g.
qemu may fail.

[What else to disable?]
[CONFIG_FRAME_POINTER?]


What do we miss?

Macros. (They are also "inlines" of sorts).
Some macros even define inline functions (!). Example:
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static inline notrace int ftrace_get_offsets_##call( \
struct ftrace_data_offsets_##call *__data_offsets, proto) \
{ \
int __data_size = 0; \
int __maybe_unused __item_length; \
struct ftrace_raw_##call __maybe_unused *entry; \
\
tstruct; \
\
return __data_size; \
}

Bad coding practice of this type:
Fourteen separate *.c files with
#include "echoaudio.c" // contains static functions, they get duplicated...

Attachment: inline_hunting_v1.tar.gz
Description: GNU Zip compressed data