[PATCH 2/3] sched/fair: Move hot load_avg into its own cacheline

From: Waiman Long
Date: Wed Nov 25 2015 - 14:11:14 EST


If a system with large number of sockets was driven to full
utilization, it was found that the clock tick handling occupied a
rather significant proportion of CPU time when fair group scheduling
was enabled which was true for most distributions.

Running a java benchmark on a 16-socket IvyBridge-EX system, the perf
profile looked like:

10.52% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt
9.66% 0.05% java [kernel.vmlinux] [k] hrtimer_interrupt
8.65% 0.03% java [kernel.vmlinux] [k] tick_sched_timer
8.56% 0.00% java [kernel.vmlinux] [k] update_process_times
8.07% 0.03% java [kernel.vmlinux] [k] scheduler_tick
6.91% 1.78% java [kernel.vmlinux] [k] task_tick_fair
5.24% 5.04% java [kernel.vmlinux] [k] update_cfs_shares

In particular, the high CPU time consumed by update_cfs_shares()
was mostly due to contention on the cacheline that contained the
task_group's load_avg statistical counter. This cacheline may also
contains variables like shares, cfs_rq & se which are accessed rather
frequently during clock tick processing.

This patch moves the load_avg variable into another cacheline separated
from the other frequently accessed variables. By doing so, the perf
profile became:

9.44% 0.00% java [kernel.vmlinux] [k] smp_apic_timer_interrupt
8.74% 0.01% java [kernel.vmlinux] [k] hrtimer_interrupt
7.83% 0.03% java [kernel.vmlinux] [k] tick_sched_timer
7.74% 0.00% java [kernel.vmlinux] [k] update_process_times
7.27% 0.03% java [kernel.vmlinux] [k] scheduler_tick
5.94% 1.74% java [kernel.vmlinux] [k] task_tick_fair
4.15% 3.92% java [kernel.vmlinux] [k] update_cfs_shares

The %cpu time is still pretty high, but it is better than before. The
benchmark results before and after the patch was as follows:

Before patch - Max-jOPs: 907533 Critical-jOps: 134877
After patch - Max-jOPs: 916011 Critical-jOps: 142366

Signed-off-by: Waiman Long <Waiman.Long@xxxxxxx>
---
kernel/sched/sched.h | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index efd3bfc..e679895 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -248,7 +248,12 @@ struct task_group {
unsigned long shares;

#ifdef CONFIG_SMP
- atomic_long_t load_avg;
+ /*
+ * load_avg can be heavily contended at clock tick time, so put
+ * it in its own cacheline separated from the fields above which
+ * will also be accessed at each tick.
+ */
+ atomic_long_t load_avg ____cacheline_aligned;
#endif
#endif

--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/