[PATCH] KVM: vmx: fix underflow in TSC deadline calculation

From: Wanpeng Li
Date: Tue Jun 28 2016 - 02:54:29 EST


INFO: rcu_sched detected stalls on CPUs/tasks:
1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
(detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R running task 0 3529 3525 0x00080808
ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.

If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.

This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.

Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
Cc: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
---
arch/x86/kvm/lapic.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fdc05ae..b15e32a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1454,11 +1454,18 @@ static void start_apic_timer(struct kvm_lapic *apic)
/* lapic timer in tsc deadline mode */
u64 tscdeadline = apic->lapic_timer.tscdeadline;

- if (kvm_x86_ops->set_hv_timer &&
- !kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
- apic->lapic_timer.hv_timer_in_use = true;
- trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
+ if (kvm_x86_ops->set_hv_timer) {
+ if (kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
+ if (apic->lapic_timer.hv_timer_in_use) {
+ kvm_x86_ops->cancel_hv_timer(apic->vcpu);
+ apic->lapic_timer.hv_timer_in_use = false;
+ }
+ start_sw_tscdeadline(apic);
+ } else {
+ apic->lapic_timer.hv_timer_in_use = true;
+ trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
apic->lapic_timer.hv_timer_in_use);
+ }
} else
start_sw_tscdeadline(apic);
}
--
1.9.1