Re: [PATCH] compiler: enable CONFIG_OPTIMIZE_INLINING forcibly

From: Nick Desaulniers
Date: Tue Oct 01 2019 - 16:21:59 EST


On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin
<linux@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:
> > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin
> > <linux@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:
> > > > I apologize; I don't mean to be difficult. I would just like to avoid
> > > > surprises when code written with the assumption that it will be
> > > > inlined is not. It sounds like we found one issue in arm32 and one in
> > > > arm64 related to outlining. If we fix those two cases, I think we're
> > > > close to proceeding with Masahiro's cleanup, which I view as a good
> > > > thing for the health of the Linux kernel codebase.
> > >
> > > Except, using the C preprocessor for this turns the arm32 code into
> > > yuck:
> > >
> > > 1. We'd need to turn get_domain() and set_domain() into multi-line
> > > preprocessor macro definitions, using the GCC ({ }) extension
> > > so that get_domain() can return a value.
> > >
> > > 2. uaccess_save_and_enable() and uaccess_restore() also need to
> > > become preprocessor macro definitions too.
> > >
> > > So, we end up with multiple levels of nested preprocessor macros.
> > > When something goes wrong, the compiler warning/error message is
> > > going to be utterly _horrid_.
> >
> > That's why I preferred V1 of Masahiro's patch, that fixed the inline
> > asm not to make use of caller saved registers before calling a
> > function that might not be inlined.
>
> ... which I objected to based on the fact that this uaccess stuff is
> supposed to add protection against the kernel being fooled into
> accessing userspace when it shouldn't. The whole intention there is
> that [sg]et_domain(), and uaccess_*() are _always_ inlined as close
> as possible to the call site of the accessor touching userspace.

Then use the C preprocessor to force the inlining. I'm sorry it's not
as pretty as static inline functions.

>
> Moving it before the assignments mean that the compiler is then free
> to issue memory loads/stores to load up those registers, which is
> exactly what we want to avoid.
>
>
> In any case, I violently disagree with the idea that stuff we have
> in header files should be permitted not to be inlined because we
> have soo much that is marked inline.

So there's a very important subtly here. There's:
1. code that adds `inline` cause "oh maybe it would be nice to inline
this, but if it isn't no big deal"
2. code that if not inlined is somehow not correct.
3. avoid ODR violations via `static inline`

I'll posit that "we have soo much that is marked inline [is
predominantly case 1 or 3, not case 2]." Case 2 is a code smell, and
requires extra scrutiny.

> Having it moved out of line,
> and essentially the same function code appearing in multiple C files
> is really not an improvement over the current situation with excessive
> use of inlining. Anyone who has looked at the code resulting from
> dma_map_single() will know exactly what I'm talking about, which is
> way in excess of the few instructions we have for the uaccess_* stuff
> here.
>
> The right approach is to move stuff out of line - and by that, I
> mean _actually_ move the damn code, so that different compilation
> units can use the same instructions, and thereby gain from the
> whole point of an instruction cache.

And be marked __attribute__((noinline)), otherwise might be inlined via LTO.

>
> The whole "let's make inline not really mean inline" is nothing more
> than a band-aid to the overuse (and abuse) of "inline".

Let's triple check the ISO C11 draft spec just to be sure:
 6.7.4.6: A function declared with an inline function specifier is an
inline function. Making a
function an inline function suggests that calls to the function be as
fast as possible.
The extent to which such suggestions are effective is
implementation-defined. 139)
139) For example, an implementation might never perform inline
substitution, or might only perform inline
substitutions to calls in the scope of an inline declaration.
 J.3.8 [Undefined Behavior] Hints: The extent to which suggestions
made by using the inline function specifier are effective (6.7.4).

My translation:
"Please don't assume inline means anything."

For the unspecified GNU C extension __attribute__((always_inline)), it
seems to me like it's meant more for performing inlining (an
optimization) at -O0. Whether the compiler warns or not seems like a
nice side effect, but provides no strong guarantee otherwise.

I'm sorry that so much code may have been written with that
assumption, and I'm sorry to be the bearer of bad news, but this isn't
a recent change. If code was written under false assumptions, it
should be rewritten. Sorry.
--
Thanks,
~Nick Desaulniers