Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT

From: H. Peter Anvin
Date: Thu Jan 22 2009 - 18:55:52 EST


Zachary Amsden wrote:
On Thu, 2009-01-22 at 14:49 -0800, H. Peter Anvin wrote:

There is also the option to use assembly wrappers to avoid relying on the calling convention. This is particularly so since we have sites where as little as a two-byte instruction gets bloated up with huge push/pop sequences around a tiny instruction. Those would be better served with a direct call to a stub (5 bytes), which would be repatched to the two-byte instruction + 3 byte nop.

Yes, for known trivial ops (most!), there isn't any reason to ever have
a call to begin with; simply an inline instruction sequence would be
fine, and only those callers that override the sequence would need to
patch. It's possible to write clever macros to assure there is always
space for a 5 byte call.


It's functionally speaking the same thing... the advantage with starting out with the call and then patch in the native code as opposed to the other way around is to be able to handle things properly before we're ready to run the patching code.

Right now a number of the call sites contain a huge push/pop sequence followed by an indirect call. We can patch in the native code to avoid the branch overhead, but the register constraints and icache footprint is unchanged.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/