Re: [patch] entry.S asm improvement (removed some ugly jmp)

John Reiser (jreiser@teleport.com)
Sat, 28 Nov 1998 08:35:24 -0800


The processor has an internal stack of return addresses from CALL
instructions. This stack holds upto 4 addresses, and is used to predict
the destination address of a RET so that instruction prefetch can happen
earlier. If the prediction misses, then the penalty is about a dozen
cycles on a Pentium Pro, Pentium II, Celeron, Xeon, or any Intel x86
processor with "dynamic execution". The penalty on a Pentium is about
five cycles. Thus, the penalty is about the same as a pipeline flush, and
on a dynamic execution processor it is also about the same as a cache
miss. The first mis-predicted RET could cause all RETs then in the
internal stack to be mis-predicted (the easy implementation). So, be
careful in high-frequency areas. But if there will be a context switch,
or if the depth is now only 1, or will exceed 4 before the RET, then
definitely do use the PUSHL+JMP instead of CALL+JMP.

-- 
-----------------------------------------------------------
jreiser@teleport.com (John Reiser)
-----------------------------------------------------------

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/