Re: [PATCH v3] x86: use builtins to read eflags

From: Bill Wendling
Date: Tue Feb 08 2022 - 18:19:07 EST


On Tue, Feb 8, 2022 at 1:14 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> From: Nick Desaulniers
> > Sent: 07 February 2022 22:12
> >
> > On Thu, Feb 3, 2022 at 4:57 PM Bill Wendling <morbo@xxxxxxxxxx> wrote:
> > >
> > > GCC and Clang both have builtins to read and write the EFLAGS register.
> > > This allows the compiler to determine the best way to generate this
> > > code, which can improve code generation.
> > >
> > > This issue arose due to Clang's issue with the "=rm" constraint. Clang
> > > chooses to be conservative in these situations, and so uses memory
> > > instead of registers. This is a known issue, which is currently being
> > > addressed.
>
> How much performance would be lost by just using "=r"?
>
> You get two instructions if the actual target is memory.
> This might be a marginal code size increase - but not much,
> It might also slow things down if the execution is limited
> by the instruction decoder.
>
> But on Intel cpu 'pop memory' is 2 uops, exactly the same
> as 'pop register' 'store register' (and I think amd is similar).
> So the actual execution time is exactly the same for both.
>
> Also it looks like clang's builtin is effectively "=r".
> Compiling:
> long fl;
> void native_save_fl(void) {
> fl = __builtin_ia32_readeflags_u64();
> }
> Not only generates a stack frame, it also generates:
> pushf; pop %rax; mov mem, %rax.
>
It used to be "=r" (see f1f029c7bfbf4e), but was changed back to "=rm"
in ab94fcf528d127. This pinging back and forth is another reason to
use the builtins and be done with it.

-bw