Re: CONFIG_OPTIMIZE_INLINING fun

From: Andi Kleen
Date: Sat Nov 15 2008 - 11:26:00 EST



> gcc 4.3.2 wasn't doing well enough: it was still generating lots
> of constant_test_bits, even without CONFIG_CC_OPTIMIZE_FOR_SIZE.

Hmm I thought the heuristics had been improved in 4.3, or maybe it
was 4.4 only.

> Mind you, I've just checked its __delete_from_swap_cache is okay,
> doing efficient tests without calling anything else. Maybe 4.3.2
> is actually making a good judgement of when constant_test_bit is
> appropriate (maybe: I don't know)

constant_test_bit is supposed to be optimized away, so it
should never be inline ideally. The problem with the older
gcc heuristics was that it just counted code to
determine if a function is suitable for inlining
or not, without checking how much of it can be constant
evaluated first.

- and the problem is rather that
> we get so many copies of the same code, one per compilation unit.
>
> I'd have liked to try 4.4, but seem unable to download what's needed
> for it today: maybe someone else has a snapshot already and can say
> whether it too puts lots of constant_test_bit lines in System.map
> when the kernel is built with CONFIG_OPTIMIZE_INLINING=y (with and
> without CONFIG_CC_OPTIMIZE_FOR_SIZE=y).

I tried with the older gcc 4.4 snapshot I had around, but it
ICEed on setup_percpu.c unfortunately.

> I did follow Sam's advice, and once I'd changed all the inlines
> in include/linux/page-flags.h and include/linux/page_cgroup.h
> and arch/x86/include/asm/bitops.h to __always_inline, then yes,
> gcc 4.2.1 stopped giving me lots of constant_test_bits and " Page"
> functions with CC_OPTIMIZE_FOR_SIZE=y and OPTIMIZE_INLINING=y, and
> the code generated for __delete_from_swap_cache was reasonable.
>
> But how do we know to stop there? that's just as far as I'd
> noticed. It is quite conceivable that constant_test_bit poses the
> biggest challenge to gcc, but there may be other bad examples too.

As long as there isn't significant code in the inline that
is expected to be optimized away then this shouldn't be needed.

BTW an alternative would be to also tweak the respective
thresholds in gcc using --param, but the problem is that
the behaviour of those changes between options so I don't
think it's a good idea.

> If we're reduced to adding __always_inline in lots of places,
> doesn't that mean CONFIG_OPTIMIZE_INLINING just isn't working?

It's not working for the case when a lot of code in the inline
is expected to be optimized away (as is the case with constant_test_bit
which is expected to be 100% optimized). Otherwise it should work.

BTW that is why it is also not needed to mark
all of bitops.h, but only a few functions there (see my patch)


> that we'd be better just to switch it off? Because later on
> some other improvement will be made, and we'll want the things
> already marked __always_inline to become __usually_inline_but_
> _not_if_new_gcc_knows_better, etc. Our unceasing inline battles.
>
> I'd have thought it better to leave the "inline"s as they are,
> but retune the __GNUC__ version in compiler-gcc.h, and reflect
> what's known in the advice in the OPTIMIZE_INLINING help text.

If we had a lot of inlines like that I would I agree with you,
but I think there are few enough of them that manually
marking them is better.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/