[RFC PATCH v2 0/7] x86/idle: add halt poll support

From: Yang Zhang
Date: Tue Aug 29 2017 - 07:47:30 EST


Some latency-intensive workload will see obviously performance
drop when running inside VM. The main reason is that the overhead
is amplified when running inside VM. The most cost i have seen is
inside idle path.

This patch introduces a new mechanism to poll for a while before
entering idle state. If schedule is needed during poll, then we
don't need to goes through the heavy overhead path.

Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):

1. w/o patch:
2493.14 ns/ctxsw -- 200.3 %CPU

2. w/ patch:
halt_poll_threshold=10000 -- 1485.96ns/ctxsw -- 201.0 %CPU
halt_poll_threshold=20000 -- 1391.26 ns/ctxsw -- 200.7 %CPU
halt_poll_threshold=30000 -- 1488.55 ns/ctxsw -- 200.1 %CPU
halt_poll_threshold=500000 -- 1159.14 ns/ctxsw -- 201.5 %CPU

3. kvm dynamic poll
halt_poll_ns=10000 -- 2296.11 ns/ctxsw -- 201.2 %CPU
halt_poll_ns=20000 -- 2599.7 ns/ctxsw -- 201.7 %CPU
halt_poll_ns=30000 -- 2588.68 ns/ctxsw -- 211.6 %CPU
halt_poll_ns=500000 -- 2423.20 ns/ctxsw -- 229.2 %CPU

4. idle=poll
2050.1 ns/ctxsw -- 1003 %CPU

5. idle=mwait
2188.06 ns/ctxsw -- 206.3 %CPU

Here is the data we get when running benchmark netperf:

1. w/o patch:
14556.8 bits/s -- 144.2 %CPU

2. w/ patch:
halt_poll_threshold=10000 -- 15803.89 bits/s -- 159.5 %CPU
halt_poll_threshold=20000 -- 15899.04 bits/s -- 161.5 %CPU
halt_poll_threshold=30000 -- 15642.38 bits/s -- 161.8 %CPU
halt_poll_threshold=40000 -- 18040.76 bits/s -- 184.0 %CPU
halt_poll_threshold=50000 -- 18877.61 bits/s -- 197.3 %CPU

3. kvm dynamic poll
halt_poll_ns=10000 -- 15876.00 bits/s -- 172.2 %CPU
halt_poll_ns=20000 -- 15602.58 bits/s -- 185.4 %CPU
halt_poll_ns=30000 -- 15930.69 bits/s -- 194.4 %CPU
halt_poll_ns=40000 -- 16413.09 bits/s -- 195.3 %CPU
halt_poll_ns=50000 -- 16417.42 bits/s -- 196.3 %CPU

4. idle=poll in guest
18441.3bit/s -- 1003 %CPU

5. idle=mwait in guest
15760.6 bits/s -- 157.6 %CPU

V1 -> V2:
- integrate the smart halt poll into paravirt code
- use idle_stamp instead of check_poll
- since it hard to get whether vcpu is the only task in pcpu, so we
don't consider it in this series.(May improve it in future)

Yang Zhang (7):
x86/paravirt: Add pv_idle_ops to paravirt ops
KVM guest: register kvm_idle_poll for pv_idle_ops
sched/idle: Add poll before enter real idle path
x86/paravirt: Add update in x86/paravirt pv_idle_ops
Documentation: Add three sysctls for smart idle poll
KVM guest: introduce smart idle poll algorithm
sched/idle: update poll time when wakeup from idle

Documentation/sysctl/kernel.txt | 25 +++++++++++++
arch/x86/include/asm/paravirt.h | 10 ++++++
arch/x86/include/asm/paravirt_types.h | 7 ++++
arch/x86/kernel/kvm.c | 67 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/paravirt.c | 11 ++++++
arch/x86/kernel/process.c | 7 ++++
include/linux/kernel.h | 6 ++++
include/linux/sched/idle.h | 4 +++
kernel/sched/core.c | 4 +++
kernel/sched/idle.c | 9 +++++
kernel/sysctl.c | 23 ++++++++++++
11 files changed, 173 insertions(+)

--
1.8.3.1