[PATCH v3] KVM: x86/tsc: Don't sync user changes to TSC with KVM-initiated change

From: Like Xu
Date: Mon Jul 31 2023 - 04:12:27 EST


From: Like Xu <likexu@xxxxxxxxxxx>

Add kvm->arch.user_changed_tsc to avoid synchronizing user changes to
the TSC with the KVM-initiated change in kvm_arch_vcpu_postcreate() by
conditioning this mess on userspace having written the TSC at least
once already.

Here lies UAPI baggage: user-initiated TSC write with a small delta
(1 second) of virtual cycle time against real time is interpreted as an
attempt to synchronize the CPU. In such a scenario, the vcpu's tsc_offset
is not configured as expected, resulting in significant guest service
response latency, which is observed in our production environment.

Reported-by: Yong He <alexyonghe@xxxxxxxxxxx>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217423
Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
Suggested-by: Oliver Upton <oliver.upton@xxxxxxxxx>
Original-by: Oliver Upton <oliver.upton@xxxxxxxxx>
Tested-by: Like Xu <likexu@xxxxxxxxxxx>
Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
---
V2 -> V3 Changelog:
- Use the kvm->arch.user_changed_tsc proposal; (Oliver & Paolo)
V2: https://lore.kernel.org/kvm/20230724073516.45394-1-likexu@xxxxxxxxxxx/

arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 23 ++++++++++++++++-------
2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3bc146dfd38d..e8d423ef1474 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1303,6 +1303,7 @@ struct kvm_arch {
u64 cur_tsc_offset;
u64 cur_tsc_generation;
int nr_vcpus_matched_tsc;
+ bool user_changed_tsc;

u32 default_tsc_khz;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 278dbd37dab2..eeaf4ad9174d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2713,7 +2713,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
kvm_track_tsc_matching(vcpu);
}

-static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
+static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data, bool user_initiated)
{
struct kvm *kvm = vcpu->kvm;
u64 offset, ns, elapsed;
@@ -2734,20 +2734,29 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
* kvm_clock stable after CPU hotplug
*/
synchronizing = true;
- } else {
+ } else if (kvm->arch.user_changed_tsc) {
u64 tsc_exp = kvm->arch.last_tsc_write +
nsec_to_cycles(vcpu, elapsed);
u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL;
/*
- * Special case: TSC write with a small delta (1 second)
- * of virtual cycle time against real time is
- * interpreted as an attempt to synchronize the CPU.
+ * Here lies UAPI baggage: user-initiated TSC write with
+ * a small delta (1 second) of virtual cycle time
+ * against real time is interpreted as an attempt to
+ * synchronize the CPU.
+ *
+ * Don't synchronize user changes to the TSC with the
+ * KVM-initiated change in kvm_arch_vcpu_postcreate()
+ * by conditioning this mess on userspace having
+ * written the TSC at least once already.
*/
synchronizing = data < tsc_exp + tsc_hz &&
data + tsc_hz > tsc_exp;
}
}

+ if (user_initiated)
+ kvm->arch.user_changed_tsc = true;
+
/*
* For a reliable TSC, we can match TSC offsets, and for an unstable
* TSC, we add elapsed time in this computation. We could let the
@@ -3776,7 +3785,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
case MSR_IA32_TSC:
if (msr_info->host_initiated) {
- kvm_synchronize_tsc(vcpu, data);
+ kvm_synchronize_tsc(vcpu, data, true);
} else {
u64 adj = kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_offset;
adjust_tsc_offset_guest(vcpu, adj);
@@ -11950,7 +11959,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
if (mutex_lock_killable(&vcpu->mutex))
return;
vcpu_load(vcpu);
- kvm_synchronize_tsc(vcpu, 0);
+ kvm_synchronize_tsc(vcpu, 0, false);
vcpu_put(vcpu);

/* poll control enabled by default */

base-commit: 5a7591176c47cce363c1eed704241e5d1c42c5a6
--
2.41.0