Re: [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency

From: Mathieu Desnoyers
Date: Fri Oct 01 2021 - 12:13:33 EST


----- On Sep 29, 2021, at 1:41 PM, Segher Boessenkool segher@xxxxxxxxxxxxxxxxxxx wrote:

> Hi!
>
> On Wed, Sep 29, 2021 at 02:28:37PM +0200, Florian Weimer wrote:
>> If you need a specific instruction emitted, you need a compiler
>> intrinsic or inline assembly.
>
> Not an intrinsic. Builtins (like almost all other code) do not say
> "generate this particular machine code", they say "generate code that
> does <this>". That is one reason why builtins are more powerful than
> inline assembler (another related reason is that they tell the compiler
> exactly what behaviour is expected).
>
>> I don't think it's possible to piggy-back this on something else.
>
> Unless we get a description of what this does in term of language
> semantics (instead of generated machine code), there is no hope, even.

Hi Segher,

Let me try a slightly improved attempt at describing what I am looking
for in terms of language semantics.

First, let's suppose we define two new compiler builtins, e.g.
__sync_ctrl_dep_rw() and __sync_ctrl_dep_acquire().

Their task would be to ensure that a R->W or R->RW (acquire) dependency between the
volatile loads used as input of the evaluated expression and following volatile
stores, volatile loads for R->RW, volatile asm, memory clobbers, is present in the
following situations:

When the builtin is used around evaluation of the left operand of the && (logical
AND) and || (logical OR) expression, the R->W or R->RW dependency should be
present before evaluating the right operand.

When the builtin is used around evaluation of the first operand of the ternary
"question-mark" operator, the R->W or R->RW dependency should be present before
evaluating the second or third operands.

When the builtin is used around evaluation of the controlling expressions of
if, switch, while, and do-while statements, as well as of the second operand of
the for statement, the R->W or R->RW dependency should be present before the
next sequence point is evaluated.

One cheap way to achieve said R->W dependency (as well as R->RW on architectures which
to not reorder R->R) is to ensure that the generated assembly contains a conditional
branch. Other ways to ensure this include more heavy-weight approaches such as explicit
barriers.

Hopefully my description above is slightly closer to the expected language
semantics.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com