Re: 387 emulator hack - mutant AAD trick - any objections?

From: cutaway
Date: Fri May 27 2005 - 08:05:31 EST


From: "Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx>
>
> But certainly not worse than the alternatives for these processors which
> actually lack the x87 subset.
>

What I'm looking for is reasonable compromise on the space issue. I'm
willing to spend a few bytes to help out the weak CPU's this will run on.
The embedded market (and whatever low end desktop machines are still out
there) is fairly size sensitive. Cutting a few pages out might be the
difference between thrashing and not thrashing ;-> I've also got some GCC
hacks in mind so it can treat the SX differently than 386DX. Right now it
doesn't distinguish and makes some very poor (for the SX) choices even when
told to space optimize. Being hyper memory constrained, even something like
the ubiquitous MOV EAX,1 (5 bytes) is slower on the SX than XOR EAX,EAX (2
bytes) / INC EAX (1 byte) - in fact the MOV is about 30% slower than the
XOR/INC :O The situation reverses on the DX and 486. 386SX really needs
*completely* different code gen rules than DX or 486.

I just benchmarked the AAD performance versus alternatives on a 386SLC(a
modestly cached 386SX variation IBM produced) and the the AAD is visible
loser. Using the AAM is a win over the existing code though.

I pondered on the the extended precision divide routine that's being called
in this loop and with a little underhanded treachery managed to eliminate a
push/pop of ESI from it by recycling EBP to address its parms once the frame
was no longer needed. This is an awsome trick when the code is simple enough
that you can get away with doing it and don't need to reference a bunch of
parameters. In this case it only needed two, so you can peel the first off
using the normal EBP, then peel the second directly into EBP itself (which
destroys its usability to address the frame of course, but you've already
got what you wanted and don't care at that point<g>). Then use EPB as you
might use ESI from then on and eliminate the save/restore of ESI. Its
sneaky, but it works.

For a modest number of EBP references (less than 6 or so) the occasional
instruction plumping necessary with implied 0 displacements is well offset
by the elimination of sloshing a couple of dwords on and off the stack to
save a register. push/pop dwords really hurt the SX .

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/