Re: [PATCH 0/4] kbuild: build speed improvment of CONFIG_TRIM_UNUSED_KSYMS

From: Masahiro Yamada
Date: Thu Feb 25 2021 - 14:00:02 EST


On Fri, Feb 26, 2021 at 2:20 AM Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
>
> On Fri, 26 Feb 2021, Masahiro Yamada wrote:
>
> >
> > Now CONFIG_TRIM_UNUSED_KSYMS is revived, but Linus is still unhappy
> > about the build speed.
> >
> > I re-implemented this feature, and the build time cost is now
> > almost unnoticeable level.
> >
> > I hope this makes Linus happy.
>
> :-)
>
> I'm surprised to see that Linus is using this feature. When disabled
> (the default) this should have had no impact on the build time.

Linus is not using this feature, but does build tests.
After pulling the module subsystem pull request in this merge window,
CONFIG_TRIM_UNUSED_KSYMS was enabled by allmodconfig.


> This feature provides a nice security advantage by significantly
> reducing the kernel input surface. And people are using that also to
> better what third party vendor can and cannot do with a distro kernel,
> etc. But that's not the reason why I implemented this feature in the
> first place.
>
> My primary goal was to efficiently reduce the kernel binary size using
> LTO even with kernel modules enabled.


Clang LTO landed in this MW.

Do you think it will reduce the kernel binary size?
No, opposite.

CONFIG_LTO_CLANG cannot trim any code even if it
is obviously unused.
Hence, it never reduces the kernel binary size.
Rather, it produces a bigger kernel.

The reason is Clang LTO was implemented against
relocatable ELF (vmlinux.o) .

I pointed out this flaw in the review process, but
it was dismissed.

This is the main reason why I did not give any Ack
(but it was merged via Kees Cook's tree).


So, the help text of this option should be revised:

This option allows for unused exported symbols to be dropped from
the build. In turn, this provides the compiler more opportunities
(especially when using LTO) for optimizing the code and reducing
binary size. This might have some security advantages as well.

Clang LTO is opposite to your expectation.



> Each EXPORT_SYMBOL() created a
> symbol dependency that prevented LTO from optimizing out the related
> code even though a tiny fraction of those exported symbols were needed.
>
> The idea behind the recursion was to catch those cases where disabling
> an exported symbol within a module would optimize out references to more
> exported symbols that, in turn, could be disabled and possibly trigger
> yet more code elimination. There is no way that can be achieved without
> extra compiler passes in a recursive manner.

I do not understand.

Modules are relocatable ELF.
Clang LTO cannot eliminate any code.
GCC LTO does not work with relocatable ELF
in the first place.


Are you talking about a story in a perfect world?
But, I do not know how LTO can eliminate dead code
from relocatable ELF.




- Current implementation

CLANG LTO works against vmlinux.o,
so it is completely useless for the purpose of
eliminating dead code.

So, this case is don't care.
TRIM_UNUSED_KSYMS removes only the meta data of EXPORT_SYMBOL,
but no further optimization anyway.


- What if Clang LTO had been implemented in the final link?
(this means LTO runs 3 times if KALLSYMS_ALL is enabled)

With proper linker script input with /DISCARD/,
the meta-data of EXPORT_SYMBOL() will be dropped,
and LTO should be able to do further dead code elimination.
So, I guess we do not need to no-op EXPORT_SYMBOL by CPP
(unless I am missing something).






--
Best Regards
Masahiro Yamada