Re: [PATCH 1/2] x86/fpu: Measure the Latency of XSAVE and XRSTOR

From: Dave Hansen
Date: Mon Jul 25 2022 - 13:44:28 EST


On 7/24/22 13:54, David Laight wrote:
> I've done some experiments that measure short instruction latencies.
> Basically I found:

Short? The instructions in question can write up to about 12k of data.
That's not "short" by any means.

I'm also not sure precision here is all that important. The main things
we want to know here when and where the init and modified optimizations
are coming into play. In other words, how often is there actual data
that *needs* to be saved and restored and can't be optimized away.

So, sure, if we were measuring a dozen cycles here, you could make an
argument that this _might_ be problematic.

But, in this case, we really just want to be able to tell when
XSAVE/XRSTOR are getting more or less expensive and also get out a
minimal amount of data (RFBM/XINUSE) to make a guess why that might be.

Is it *REALLY* worth throwing serializing instructions in and moving
clock sources to do that? Is the added precision worth it?