[PATCH] timekeeping:add padding in timekeeper for Unixbench pipe

From: Guo Hui
Date: Fri Jan 06 2023 - 01:30:18 EST


When the LLC cache line size is 128 bytes, such as Kunpeng 920,
the seq attribute and xtime attribute in the structure tk_core
are completely in the same LLC cache line,
and xtime_sec is the data protected by the seq lock
in the function ktime_get_coarse_real_ts64,
so seq and xtime_sec are in the same LLC cache line
causing the false sharing problem.

Adding padding before xtime_sec in the structure timekeeper
is based on the comment of the structure tk_read_base: "This
struct has size 56 byte on 64 bit. Together with a seqcount
it occupies a single 64byte cache line." Therefore,
seq and the structure tk_read_base
should be placed in the same 64-byte cacheline.

The performance data of Unixbench pipe on Kunpeng 920 is as follows:

Enable the LSE instruction:
seq and xtime are in the same LLC cache line:
System Benchmarks Partial Index BASELINE RESULT INDEX
Pipe Throughput 12440.0 14800574.4 11897.6
Pipe-based Context Switching 4000.0 4357419.0 10893.5
========
System Benchmarks Index Score (Partial Only) 11384.5

seq and xtime are not in the same LLC cache line:
System Benchmarks Partial Index BASELINE RESULT INDEX
Pipe Throughput 12440.0 16546306.6 13300.9
Pipe-based Context Switching 4000.0 5654281.8 14135.7
========
System Benchmarks Index Score (Partial Only) 13711.9

When the LSE instruction is enabled,
Pipe Throughput increases by 11.79%,
and Pipe-based Context Switching increases by 29.76%.

Close the LSE instruction:
seq and xtime are in the same LLC cache line:
System Benchmarks Partial Index BASELINE RESULT INDEX
Pipe Throughput 12440.0 36375286.5 29240.6
Pipe-based Context Switching 4000.0 11994739.7 29986.8
========
System Benchmarks Index Score (Partial Only) 29611.4

seq and xtime are not in the same LLC cache line:
System Benchmarks Partial Index BASELINE RESULT INDEX
Pipe Throughput 12440.0 44887148.8 36082.9
Pipe-based Context Switching 4000.0 13666392.0 34166.0
========
System Benchmarks Index Score (Partial Only) 35111.4

When the LSE instruction is disabled,
Pipe Throughput increases by 23.40%,
and Pipe-based Context Switching increases by 13.94%.

Signed-off-by: Guo Hui <guohui@xxxxxxxxxxxxx>
---
include/linux/timekeeper_internal.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 84ff2844d..d363cd1f3 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -92,6 +92,7 @@ struct tk_read_base {
struct timekeeper {
struct tk_read_base tkr_mono;
struct tk_read_base tkr_raw;
+ u64 padding;
u64 xtime_sec;
unsigned long ktime_sec;
struct timespec64 wall_to_monotonic;
--
2.20.1