Re: [tip:x86/asm] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks

From: Paolo Bonzini
Date: Mon Dec 14 2015 - 05:18:44 EST




On 14/12/2015 09:16, tip-bot for Andy Lutomirski wrote:
> Commit-ID: 677a73a9aa5433ea728200c26a7b3506d5eaa92b
> Gitweb: http://git.kernel.org/tip/677a73a9aa5433ea728200c26a7b3506d5eaa92b
> Author: Andy Lutomirski <luto@xxxxxxxxxx>
> AuthorDate: Thu, 10 Dec 2015 19:20:18 -0800
> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> CommitDate: Fri, 11 Dec 2015 08:56:02 +0100
>
> x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
>
> This gets rid of the "did TSC go backwards" logic and just
> updates all clocks. It should work better (no more disabling of
> fast timing) and more reliably (all of the clocks are actually
> updated).
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Brian Gerst <brgerst@xxxxxxxxx>
> Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx
> Link: http://lkml.kernel.org/r/861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@xxxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> ---
> arch/x86/kvm/x86.c | 75 +++---------------------------------------------------
> 1 file changed, 3 insertions(+), 72 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 00462bd..6e32e87 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -123,8 +123,6 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
> unsigned int __read_mostly lapic_timer_advance_ns = 0;
> module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR);
>
> -static bool __read_mostly backwards_tsc_observed = false;
> -
> #define KVM_NR_SHARED_MSRS 16
>
> struct kvm_shared_msrs_global {
> @@ -1671,7 +1669,6 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
> &ka->master_cycle_now);
>
> ka->use_master_clock = host_tsc_clocksource && vcpus_matched
> - && !backwards_tsc_observed
> && !ka->boot_vcpu_runs_old_kvmclock;
>
> if (ka->use_master_clock)
> @@ -7366,88 +7363,22 @@ int kvm_arch_hardware_enable(void)
> struct kvm_vcpu *vcpu;
> int i;
> int ret;
> - u64 local_tsc;
> - u64 max_tsc = 0;
> - bool stable, backwards_tsc = false;
>
> kvm_shared_msr_cpu_online();
> ret = kvm_x86_ops->hardware_enable();
> if (ret != 0)
> return ret;
>
> - local_tsc = rdtsc();
> - stable = !check_tsc_unstable();
> list_for_each_entry(kvm, &vm_list, vm_list) {
> kvm_for_each_vcpu(i, vcpu, kvm) {
> - if (!stable && vcpu->cpu == smp_processor_id())
> + if (vcpu->cpu == smp_processor_id()) {
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> - if (stable && vcpu->arch.last_host_tsc > local_tsc) {
> - backwards_tsc = true;
> - if (vcpu->arch.last_host_tsc > max_tsc)
> - max_tsc = vcpu->arch.last_host_tsc;
> + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE,
> + vcpu);
> }
> }
> }
>
> - /*
> - * Sometimes, even reliable TSCs go backwards. This happens on
> - * platforms that reset TSC during suspend or hibernate actions, but
> - * maintain synchronization. We must compensate. Fortunately, we can
> - * detect that condition here, which happens early in CPU bringup,
> - * before any KVM threads can be running. Unfortunately, we can't
> - * bring the TSCs fully up to date with real time, as we aren't yet far
> - * enough into CPU bringup that we know how much real time has actually
> - * elapsed; our helper function, get_kernel_ns() will be using boot
> - * variables that haven't been updated yet.
> - *
> - * So we simply find the maximum observed TSC above, then record the
> - * adjustment to TSC in each VCPU. When the VCPU later gets loaded,
> - * the adjustment will be applied. Note that we accumulate
> - * adjustments, in case multiple suspend cycles happen before some VCPU
> - * gets a chance to run again. In the event that no KVM threads get a
> - * chance to run, we will miss the entire elapsed period, as we'll have
> - * reset last_host_tsc, so VCPUs will not have the TSC adjusted and may
> - * loose cycle time. This isn't too big a deal, since the loss will be
> - * uniform across all VCPUs (not to mention the scenario is extremely
> - * unlikely). It is possible that a second hibernate recovery happens
> - * much faster than a first, causing the observed TSC here to be
> - * smaller; this would require additional padding adjustment, which is
> - * why we set last_host_tsc to the local tsc observed here.
> - *
> - * N.B. - this code below runs only on platforms with reliable TSC,
> - * as that is the only way backwards_tsc is set above. Also note
> - * that this runs for ALL vcpus, which is not a bug; all VCPUs should
> - * have the same delta_cyc adjustment applied if backwards_tsc
> - * is detected. Note further, this adjustment is only done once,
> - * as we reset last_host_tsc on all VCPUs to stop this from being
> - * called multiple times (one for each physical CPU bringup).
> - *
> - * Platforms with unreliable TSCs don't have to deal with this, they
> - * will be compensated by the logic in vcpu_load, which sets the TSC to
> - * catchup mode. This will catchup all VCPUs to real time, but cannot
> - * guarantee that they stay in perfect synchronization.
> - */
> - if (backwards_tsc) {
> - u64 delta_cyc = max_tsc - local_tsc;
> - backwards_tsc_observed = true;
> - list_for_each_entry(kvm, &vm_list, vm_list) {
> - kvm_for_each_vcpu(i, vcpu, kvm) {
> - vcpu->arch.tsc_offset_adjustment += delta_cyc;
> - vcpu->arch.last_host_tsc = local_tsc;
> - kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> - }
> -
> - /*
> - * We have to disable TSC offset matching.. if you were
> - * booting a VM while issuing an S4 host suspend....
> - * you may have some problem. Solving this issue is
> - * left as an exercise to the reader.
> - */
> - kvm->arch.last_tsc_nsec = 0;
> - kvm->arch.last_tsc_write = 0;
> - }
> -
> - }
> return 0;
> }
>
>

NACK.

Please remove this patch from the tip tree.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/