Re: [PATCH 1/2] x86/fpu: Measure the Latency of XSAVE and XRSTOR

From: 'Yi Sun'
Date: Tue Jul 26 2022 - 05:04:20 EST


On 24.07.2022 20:54, David Laight wrote:
From: Yi Sun
Sent: 23 July 2022 09:38

Calculate the latency of instructions xsave and xrstor with new trace
points x86_fpu_latency_xsave and x86_fpu_latency_xrstor.

The delta TSC can be calculated within a single trace event. Another
option considered was to have 2 separated trace events marking the
start and finish of the xsave/xrstor instructions. The delta TSC was
calculated from the 2 trace points in user space, but there was
significant overhead added by the trace function itself.

In internal testing, the single trace point option which is
implemented here proved to be more accurate.
...

I've done some experiments that measure short instruction latencies.
Basically I found:
1) You need a suitable serialising instruction before and after
the code being tested - otherwise it can overlap whatever
you are using for timing.

The original comments here are kindly not so exact. The patch actually
makes use of trace_clock to calculate the latency but not TSC directly.
The precision here is "at most ~1 jiffy between CPUs" which is probably
acceptable to this usage.
I would like refine the comments if it made confusing.

Thanks
--Sun, Yi