[PATCH] x86/vdso: Use non-serializing instruction rdtsc

From: Rong Tao
Date: Tue May 16 2023 - 02:57:55 EST


From: Rong Tao <rongtao@xxxxxxxx>

Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction
rdtsc can achieve a 40% performance improvement with only a small loss of
precision.

The RDTSCP instruction is not a serializing instruction, but it does wait
until all previous instructions have executed and all previous loads are
globally visible. The RDTSC instruction is not a serializing instruction.
It does not necessarily wait until all previous instructions have been
executed before reading the counter.

Record the time-consuming of vdso clock_gettime(), pseudo code:

count = 1000 * 1000 * 100;
while (count--)
clock_gettime(CLOCK_REALTIME, &ts);

Time-consuming comparison:

Time Consume(ns) | rdtsc_ordered() | rdtsc() | Promote
------------------+-----------------+-----------+---------
Physical Machine | 1269147289 | 759067324 | 40%
Guest OS (KVM) | 1756615963 | 995823886 | 43%

Signed-off-by: Rong Tao <rongtao@xxxxxxxx>
---
arch/x86/include/asm/vdso/gettimeofday.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 4cf6794f9d68..342d29106208 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -228,7 +228,7 @@ static u64 vread_pvclock(void)
if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)))
return U64_MAX;

- ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
+ ret = __pvclock_read_cycles(pvti, rdtsc());
} while (pvclock_read_retry(pvti, version));

return ret;
@@ -246,7 +246,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
const struct vdso_data *vd)
{
if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
- return (u64)rdtsc_ordered();
+ return (u64)rdtsc();
/*
* For any memory-mapped vclock type, we need to make sure that gcc
* doesn't cleverly hoist a load before the mode check. Otherwise we
--
2.39.1