Re: [PATCH 0/7] ARM: hacks for link-time optimization

From: Andi Kleen
Date: Fri Dec 21 2018 - 12:20:48 EST


> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
>
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
>
> Specifically; if you have something like:
>
> int idx;
> struct object objs[2];
>
> the statement:
>
> val = objs[idx & 1].ponies;
>
> which you 'need' to be translated like:
>
> struct object *obj = objs;
> obj += (idx & 1);
> val = obj->ponies;
>
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
>
> if (idx & 1)
> obj = &objs[1];
> else
> obj = &objs[0];
>
> val = obj->ponies;

I don't see why a compiler would do such an optimization. Clearly
the second variant is worse than the first, bigger and needs
branch prediction resources.

In fact compilers usually try hard to go into the other direction
and apply if conversion.

Has anyone seen real world examples of such changes being done, or is this
all language lawyering theory?

-Andi

>
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
>
> Note that further 'optimization' is possible and the compiler could even
> make it:
>
> if (idx & 1)
> val = objs[1].ponies;
> else
> val = objs[0].ponies;
>
> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
>
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
>
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
>
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.