Re: <linux/linkage.h> generates incorrect cache alignments for 486and above

From: Jamie Lokier (lkd@tantalophile.demon.co.uk)
Date: Fri Jan 28 2000 - 07:04:26 EST


[Linus, you're Cc'ed because this refers to a previous discussion with
you, and because there's a suggested optimisation to i386 entry.S at the end]

Chris Sears wrote:
> So this is where the 16 comes from. These ifetch blocks still get
> fetched from the cache. Ya wanna see a bad example? This is from entry.S
>
> ENTRY(system_call)
> ...
> movl %eax,EAX(%esp) # save the return value
> ALIGN
> .globl ret_from_sys_call
> .globl ret_from_intr
> ret_from_sys_call:
> movl SYMBOL_NAME(bh_mask),%eax
> ...
>
> This is the output from readelf -i 1 entry.o
>
> 0x000000f4 movl %eax,0x18(%esp)
> 0x000000f8 nop
> 0x000000f9 leal 0x0(%esi),%esi
> 0x00000100 movl 0x00000000,%eax
> 0x00000105 andl 0x00000000,%eax
>
> What is this "leal" junk? That there is one very large nop.
> The price of the alignment is two nop instructions.
> If ALIGN were set to the cache line size it would be the same
> because system_call to the ALIGN is very near two cache lines
> already.
>
> This one is a tough call. Two nops in the straightline code vs
> mis-alignment in the shared code. Ok, one nop,
> they would be paired in the UV pipeline. Probably leave it be.

Linus already gave his opinion on this months ago: the ALIGN should go
because syscalls are the most common path. However, he didn't remove it.

On a Pentium as you say, the alignment takes only 1 cycle for the paired
nop. It is probably less of a hit than the misaligned jump in the page
fault case.

Especially as the jump is to large instructions (10 bytes for the pair,
but I don't know if it makes any difference as I don't know if the
Pentium's decode can work with partial ifetch blocks).

I'd guess the two most common paths are system calls and page faults.
Some applications will fault often, other will call syscalls a lot.
That's my guess (I haven't measured anything).

So on balance I'd leave the ALIGN there. Even though the exit code is
very small, most application do syscalls _and_ page faults, so even
duplicating a single cache line will sometimes mean one more cache line
of the application to reload later. And we know that a cache line takes
longer to load than a paired nop takes to execute.

Having said that, a cycle or two can be shaved of the page_fault path by
making it the case that falls through to error_code instead of
divide_error. I'm sure that page_fault is a lot more common than
divide_error, so that would seem a sensible tweak.

have a nice day,
-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:20 EST