Re: [3/8] x86/tsc: Store and check TSC ADJUST MSR

From: Henning Schild
Date: Fri Jan 27 2017 - 08:57:38 EST


Thomas,

did you by any chance look into TSC synchronization by adjusting the
absolute value (MSR_IA32_TSC) as well? As far as i have seen Linux did
that a long time ago and eventually it was stopped because it caused
more harm than good.
https://github.com/torvalds/linux/commit/95492e4646e5de8b43d9a7908d6177fb737b61f0

Before your series the whole TSC code was not touched in a long time,
with a few attempts to introduce synchronization over the years.
i.e.
https://lkml.org/lkml/2015/11/9/639
which turned out to be the BIOS messing with the ADJUST MSR

The ADJUST MSR offers an easy way to synchronize, still taking care of
all the special cases resulted in an 8-patch series. Synching without
that using the absolute value is likely much harder, but that series
might be a good foundation.

The big question is whether we can rely on all future CPUs to
support that MSR. Do "new MSRs" disappear again at some point? If we
can not rely on the ADJUST MSR, now might be a good time to revisit the
idea of synching the absolute values.

I remember having read somewhere that this series might get backported
to longterm kernels, what is the status on that?

regards,
Henning

On Sat, 19 Nov 2016 13:47:36 +0000
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> The TSC_ADJUST MSR shows whether the TSC has been modified. This is
> helpful in a two aspects:
>
> 1) It allows to detect BIOS wreckage, where SMM code tries to 'hide'
> the cycles spent by storing the TSC value at SMM entry and restoring
> it at SMM exit. On affected machines the TSCs run slowly out of sync
> up to the point where the clocksource watchdog (if available) detects
> it.
>
> The TSC_ADJUST MSR allows to detect the TSC modification before
> that and eventually restore it. This is also important for SoCs which
> have no watchdog clocksource and therefore TSC wreckage cannot be
> detected and acted upon.
>
> 2) All threads in a package are required to have the same TSC_ADJUST
> value. Broken BIOSes break that and as a result the TSC
> synchronization check fails.
>
> The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
> online. If detected set it to the value of an already online CPU
> in the same package. This also allows to reduce the number of sync
> tests because with that in place the test is only required for the
> first CPU in a package.
>
> In principle all CPUs in a system should have the same TSC_ADJUST
> value even across packages, but with physical CPU hotplug this
> assumption is not true because the TSC starts with power on, so
> physical hotplug has to do some trickery to bring the TSC into sync
> with already running packages, which requires to use an TSC_ADJUST
> value different from CPUs which got powered earlier.
>
> A final enhancement is the opportunity to compensate for unsynced
> TSCs accross nodes at boot time and make the TSC usable that way. It
> won't help for TSCs which run apart due to frequency skew between
> packages, but this gets detected by the clocksource watchdog later.
>
> The first step toward this is to store the TSC_ADJUST value of a
> starting CPU and compare it with the value of an already online CPU
> in the same package. If they differ, emit a warning and adjust it to
> the reference value. The !SMP version just stores the boot value for
> later verification.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> arch/x86/include/asm/tsc.h | 6 +++
> arch/x86/kernel/Makefile | 2 -
> arch/x86/kernel/tsc.c | 2 +
> arch/x86/kernel/tsc_sync.c | 88
> +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 97
> insertions(+), 1 deletion(-)
>
> --- a/arch/x86/include/asm/tsc.h
> +++ b/arch/x86/include/asm/tsc.h
> @@ -48,6 +48,12 @@ extern int tsc_clocksource_reliable;
> extern void check_tsc_sync_source(int cpu);
> extern void check_tsc_sync_target(void);
>
> +#ifdef CONFIG_X86_TSC
> +extern void tsc_store_and_check_tsc_adjust(void);
> +#else
> +static inline void tsc_store_and_check_tsc_adjust(void) { }
> +#endif
> +
> extern int notsc_setup(char *);
> extern void tsc_save_sched_clock_state(void);
> extern void tsc_restore_sched_clock_state(void);
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -75,7 +75,7 @@ apm-y := apm_32.o
> obj-$(CONFIG_APM) += apm.o
> obj-$(CONFIG_SMP) += smp.o
> obj-$(CONFIG_SMP) += smpboot.o
> -obj-$(CONFIG_SMP) += tsc_sync.o
> +obj-$(CONFIG_X86_TSC) += tsc_sync.o
> obj-$(CONFIG_SMP) += setup_percpu.o
> obj-$(CONFIG_X86_MPPARSE) += mpparse.o
> obj-y += apic/
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1365,6 +1365,8 @@ void __init tsc_init(void)
>
> if (unsynchronized_tsc())
> mark_tsc_unstable("TSCs unsynchronized");
> + else
> + tsc_store_and_check_tsc_adjust();
>
> check_system_tsc_reliable();
>
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -14,12 +14,95 @@
> * ( The serial nature of the boot logic and the CPU hotplug lock
> * protects against more than 2 CPUs entering this code. )
> */
> +#include <linux/topology.h>
> #include <linux/spinlock.h>
> #include <linux/kernel.h>
> #include <linux/smp.h>
> #include <linux/nmi.h>
> #include <asm/tsc.h>
>
> +struct tsc_adjust {
> + s64 bootval;
> + s64 adjusted;
> +};
> +
> +static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
> +
> +#ifndef CONFIG_SMP
> +void __init tsc_store_and_check_tsc_adjust(void)
> +{
> + struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
> + s64 bootval;
> +
> + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> + return;
> +
> + rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
> + cur->bootval = bootval;
> + cur->adjusted = bootval;
> + pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu, bootval);
> +}
> +
> +#else /* !CONFIG_SMP */
> +
> +/*
> + * Store and check the TSC ADJUST MSR if available
> + */
> +void tsc_store_and_check_tsc_adjust(void)
> +{
> + struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
> + unsigned int refcpu, cpu = smp_processor_id();
> + s64 bootval;
> +
> + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> + return;
> +
> + rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
> + cur->bootval = bootval;
> +
> + /*
> + * Check whether this CPU is the first in a package to come
> up. In
> + * this case do not check the boot value against another
> package
> + * because the package might have been physically
> hotplugged, where
> + * TSC_ADJUST is expected to be different.
> + */
> + refcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
> +
> + if (refcpu >= nr_cpu_ids) {
> + /*
> + * First online CPU in a package stores the boot
> value in
> + * the adjustment value. This value might change
> later via
> + * the sync mechanism. If that fails we still can
> yell
> + * about boot values not being consistent.
> + */
> + cur->adjusted = bootval;
> + pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,
> bootval);
> + return;
> + }
> +
> + ref = per_cpu_ptr(&tsc_adjust, refcpu);
> + /*
> + * Compare the boot value and complain if it differs in the
> + * package.
> + */
> + if (bootval != ref->bootval) {
> + pr_warn("TSC ADJUST differs: Reference CPU%u: %lld
> CPU%u: %lld\n",
> + refcpu, ref->bootval, cpu, bootval);
> + }
> + /*
> + * The TSC_ADJUST values in a package must be the same. If
> the boot
> + * value on this newly upcoming CPU differs from the
> adjustment
> + * value of the already online CPU in this package, set it
> to that
> + * adjusted value.
> + */
> + if (bootval != ref->adjusted) {
> + pr_warn("TSC ADJUST synchronize: Reference CPU%u:
> %lld CPU%u: %lld\n",
> + refcpu, ref->adjusted, cpu, bootval);
> + cur->adjusted = ref->adjusted;
> + wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
> + }
> +}
> +
> /*
> * Entry/exit counters that make sure that both CPUs
> * run the measurement code at once:
> @@ -202,6 +285,9 @@ void check_tsc_sync_target(void)
> if (unsynchronized_tsc() || tsc_clocksource_reliable)
> return;
>
> + /* Store and check the TSC ADJUST MSR */
> + tsc_store_and_check_tsc_adjust();
> +
> /*
> * Register this CPU's participation and wait for the
> * source CPU to start the measurement:
> @@ -223,3 +309,5 @@ void check_tsc_sync_target(void)
> while (atomic_read(&stop_count) != cpus)
> cpu_relax();
> }
> +
> +#endif /* CONFIG_SMP */