* Chen, Kenneth W <kenneth.w.chen@xxxxxxxxx> wrote:
Since we are talking about load balancing, we decided to measure
various value for cache_hot_time variable to see how it affects app
performance. We first establish baseline number with vanilla base
kernel (default at 2.5ms), then sweep that variable up to 1000ms. All
of the experiments are done with Ingo's patch posted earlier. Here
are the result (test environment is 4-way SMP machine, 32 GB memory,
500 disks running industry standard db transaction processing
workload):
cache_hot_time | workload throughput
--------------------------------------
2.5ms - 100.0 (0% idle)
5ms - 106.0 (0% idle)
10ms - 112.5 (1% idle)
15ms - 111.6 (3% idle)
25ms - 111.1 (5% idle)
250ms - 105.6 (7% idle)
1000ms - 105.4 (7% idle)
the following patch adds a new feature to the scheduler: during bootup
it measures migration costs and sets up cache_hot value accordingly.
The measurement is point-to-point, i.e. it can be used to measure the
migration costs in cache hierarchies - e.g. by NUMA setup code. The
patch prints out a matrix of migration costs between CPUs. (self-migration means pure cache dirtying cost)
Here are a couple of matrixes from testsystems:
A 2-way Celeron/128K box:
arch cache_decay_nsec: 1000000
migration cost matrix (cache_size: 131072, cpu: 467 MHz):
[00] [01]
[00]: 9.6 12.0
[01]: 12.2 9.8
min_delta: 12586890
using cache_decay nsec: 12586890 (12 msec)
a 2-way/4-way P4/512K HT box:
arch cache_decay_nsec: 2000000
migration cost matrix (cache_size: 524288, cpu: 2379 MHz):
[00] [01] [02] [03]
[00]: 6.1 6.1 5.7 6.1
[01]: 6.7 6.2 6.7 6.2
[02]: 5.9 5.9 6.1 5.0
[03]: 6.7 6.2 6.7 6.2
min_delta: 6053016
using cache_decay nsec: 6053016 (5 msec)
an 8-way P3/2MB Xeon box:
arch cache_decay_nsec: 6000000
migration cost matrix (cache_size: 2097152, cpu: 700 MHz):
[00] [01] [02] [03] [04] [05] [06] [07]
[00]: 92.1 184.8 184.8 184.8 184.9 90.7 90.6 90.7
[01]: 181.3 92.7 88.5 88.6 88.5 181.5 181.3 181.4
[02]: 181.4 88.4 92.5 88.4 88.5 181.4 181.3 181.4
[03]: 181.4 88.4 88.5 92.5 88.4 181.5 181.2 181.4
[04]: 181.4 88.5 88.4 88.4 92.5 181.5 181.3 181.5
[05]: 87.2 181.5 181.4 181.5 181.4 90.0 87.0 87.1
[06]: 87.2 181.5 181.4 181.5 181.4 87.9 90.0 87.1
[07]: 87.2 181.5 181.4 181.5 181.4 87.9 87.0 90.0
min_delta: 91815564
using cache_decay nsec: 91815564 (87 msec)