Re: [PATCH 00/46] gcc-LTO support for the kernel

From: Richard Biener
Date: Thu Nov 17 2022 - 08:55:15 EST


On Thu, 17 Nov 2022, Ard Biesheuvel wrote:

> On Thu, 17 Nov 2022 at 12:43, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> > > On Thu, 17 Nov 2022, Peter Zijlstra wrote:
> > >
> > > > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <jirislaby@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > this is the first call for comments (and kbuild complaints) for this
> > > > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > > > known issues. Also I updated most of the commit logs and reordered the
> > > > > > patches to groups of patches with similar intent.
> > > > > >
> > > > > > The very first patch comes from Alexander and is pending on some x86
> > > > > > queue already (I believe). I am attaching it only for completeness.
> > > > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > > > >
> > > > > > In our measurements, the performance differences are negligible.
> > > > > >
> > > > > > The kernel is bigger with gcc LTO due to more inlining.
> > > > >
> > > > > OK, so if I understand this correctly:
> > > > > - the performance is the same
> > > > > - the resulting image is bigger
> > > > > - we need a whole lot of ugly hacks to placate the linker.
> > > > >
> > > > > Pardon my cynicism, but this cover letter does not mention any
> > > > > advantages of LTO, so what is the point of all of this?
> > > >
> > > > Seconded; I really hate all the ugly required for the GCC-LTO
> > > > 'solution'. There not actually being any benefit just makes it a very
> > > > simple decision to drop all these patches on the floor.
> > >
> > > I'd say that instead a prerequesite for the series would be to actually
> > > enforce hidden visibility for everything not part of the kernel module
> > > API so the compiler can throw away unused functions. Currently it has
> > > to keep everything because with a shared object there might be external
> > > references to everything exported from individual TUs.
> >
> > I'm not sure what you're on about; only symbols annotated with
> > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > have their address taken. You can feely eliminate any unused symbol.

But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
symbols public and the rest hidden - instead all symbols global in the C TUs
will become public and the module dynamic loader details are hidden from
GCCs view of the kernel image as ELF relocatable object.

> > > There was a size benefit mentioned for module-less monolithic kernels
> > > as likely used in embedded setups, not sure if that's enough motivation
> > > to properly annotate symbols with visibility - and as far as I understand
> > > all these 'required' are actually such fixes.
> >
> > I'm not seeing how littering __visible is useful or desired, doubly so
> > for that static hack, that's just a crude work around for GCC LTO being
> > inferior for not being able to read inline asm.
>
> We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> symbols that may appear to the compiler as though they are never
> referenced.
>
> Would it be possible to repurpose those so that the LTO code knows
> which symbols it must not remove?

I find

/*
* Force the compiler to emit 'sym' as a symbol, so that we can reference
* it from inline assembler. Necessary in case 'sym' could be inlined
* otherwise, or eliminated entirely due to lack of references that are
* visible to the compiler.
*/
#define ___ADDRESSABLE(sym, __attrs) \
static void * __used __attrs \
__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
#define __ADDRESSABLE(sym) \
___ADDRESSABLE(sym, __section(".discard.addressable"))

that should be enough to force LTO keeping 'sym' - unless there's
a linker script that discards .discard.addressable which I fear LTO
will notice, losing the effect. A more direct way would be to attach
__used to 'sym' directly. __ADDRESSABLE doesn't seem to be used
directly but instead I see cases like

#define __define_initcall_stub(__stub, fn) \
int __init __stub(void); \
int __init __stub(void) \
{ \
return fn(); \
} \
__ADDRESSABLE(__stub)

where one could have added __used to the __stub prototypes instead?

The folks who worked on LTO enablement of the kernel should know the
real issue better - I understand asm()s are a pain because GCC
refuses to parse the assembler string heuristically for used
symbols (but it can never be more than heuristics). The issue with
asm()s is not so much elimination (__used solves that) but that
GCC can end up moving the asm() and the refered to symbols to
different link-time units causing unresolved symbols for non-global
symbols. -fno-toplevel-reorder should fix that at some cost.

Richard.

--
Richard Biener <rguenther@xxxxxxx>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)