Re: Code optimization <LEA Instruction>

From: Jamie Lokier (lkd@tantalophile.demon.co.uk)
Date: Fri Jan 28 2000 - 07:15:22 EST


Richard B. Johnson wrote:
> The following program clearly shows that many more 'addl' instructions
> may be executed than 'leal' instructions within a given time.

Your test is wrong because the particular `leal' instructions you use
are dependent. The delay you are seeing is due to pipeline scheduling,
not the instruction by itself. If you use the instruction in a
different context, you will find it is fast.

There's a section on Address Generation Interlock stalls which affects
Pentiums and 486s. See _any_ 486 or Pentium optimisation book... AGI
is an important scheduling detail. I think modern GCC knows about it
and schedules accordingly. Certainly, PGCC does.

> "\tleal 2(%eax), %eax\n" /* 10 leals */
> "\tleal 2(%eax), %eax\n"
> "\tleal 2(%eax), %eax\n"
> "\tleal 2(%eax), %eax\n"
> [...]

On a 486, try writing

        leal 2(%eax),%eax
        leal 2(%ebx),%ebx
        leal 2(%eax),%eax
        leal 2(%ebx),%ebx

etc. instead.

On a Pentium, try writing

        leal 2(%eax),%eax
        leal 2(%ebx),%ebx
        leal 2(%ecx),%ecx
        leal 2(%edx),%edx
        leal 2(%eax),%eax
        leal 2(%ebx),%ebx
        leal 2(%ecx),%ecx
        leal 2(%edx),%edx

instead. Then go and read about AGIs.

You're right that `leal' is not always faster than `addl', because
`addl' doesn't have the AGI stall. But it does depend on context.

I think you should still apologise to Alan anyway :-)

And Alan should apologise for being so terse, because it does depend on
context and he didn't say so :-)

have a nice day
-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:20 EST