Re: [PATCH] kvm,x86: Use the refined tsc rate for the guest tsc.

From: Sean Christopherson
Date: Mon Feb 14 2022 - 21:18:51 EST


+Anton

On Fri, Aug 06, 2021, Sean Christopherson wrote:
> IIUC, this "fixes" a race where KVM is initialized before the second call to
> tsc_refine_calibration_work() completes. Fixes in quotes because it doesn't
> actually fix the race, it just papers over the problem to get the desired behavior.
> If the race can't be truly fixed, the changelog should explain why it can't be
> fixed, otherwise fudging our way around the race is not justifiable.
>
> Ideally, we would find a way to fix the race, e.g. by ensuring KVM can't load or
> by stalling KVM initialization until refinement completes (or fails). tsc_khz is
> consumed by KVM in multiple paths, and initializing KVM before tsc_khz calibration
> is fully refined means some part of KVM will use the wrong tsc_khz, e.g. the VMX
> preemption timer. Due to sanity checks in tsc_refine_calibration_work(), the delta
> won't be more than 1%, but it's still far from ideal.

Hmm, for systems with a constant TSC, KVM can fudge around the issue by not taking
a snapshot. It's still racy and potentially fragile, e.g. if userspace manages
to create a vCPU before tsc_khz is refined, but it's not a bad standalone patch
and if it fixes your use case...

The only other alternative I can come up with is add a one-off "notifier" for KVM,
but that's rather gross, especially since TSC refinement is (hopefully) headed the
way of the Dodo.

Does this remedy your issues? Any idea if you need to support old CPUs that don't
provide a constant TSC?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eaa3b5b89c5e..6a75c2748bff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8708,13 +8708,13 @@ static int kvmclock_cpu_online(unsigned int cpu)

static void kvm_timer_init(void)
{
- max_tsc_khz = tsc_khz;
-
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
#ifdef CONFIG_CPU_FREQ
struct cpufreq_policy *policy;
int cpu;

+ max_tsc_khz = tsc_khz;
+
cpu = get_cpu();
policy = cpufreq_cpu_get(cpu);
if (policy) {
@@ -11144,7 +11144,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
kvm_vcpu_mtrr_init(vcpu);
vcpu_load(vcpu);
- kvm_set_tsc_khz(vcpu, max_tsc_khz);
+ kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz);
kvm_vcpu_reset(vcpu, false);
kvm_init_mmu(vcpu);
vcpu_put(vcpu);