Re: [PATCH] x86/tsc: RFC: re-synchronize TSCs to boot cpu TSC

From: Thomas Gleixner
Date: Fri Dec 09 2016 - 05:03:04 EST


On Fri, 9 Dec 2016, Roland Scheidegger wrote:
>
> I saw some system lockups though:
> When doing a cold boot, this kernel never managed to boot up. The last
> message seen is:
> x86: Booting SMP configuration:
> .... node #0, CPUs: #1

Weird. That really would be interesting to figure out what goes wrong
there. What bothers me is that we don't see something like this:

> [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
> -577423766270

Can you please apply the debug patch below and provide the output ?


> [ 0.094492] x86: Booting SMP configuration:
> [ 0.094534] .... node #0, CPUs: #1
> [ 0.172334] TSC ADJUST differs: Reference CPU0: -577421768610 CPU1:
> -577423766270

What on earth is this BIOS doing? That's a couple of minutes back in time.

And the difference between the max and min adjust value is 2050932 cycles.

> Without the patches on cold boot it just was as expected:
> [ 0.093700] x86: Booting SMP configuration:
> [ 0.093737] .... node #0, CPUs: #1
> [ 0.174304] TSC synchronization [CPU#0 -> CPU#1]:
> [ 0.174375] Measured 1837188 cycles TSC warp between CPUs, turning
> off TSC clock.

Not surprising given the above numbers.

Thanks,

tglx

8<-------------------------
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -95,6 +95,8 @@ bool tsc_store_and_check_tsc_adjust(void
cur->nextcheck = jiffies + HZ;
cur->warned = false;

+ pr_info("TSC ADJUST: CPU%u: %lld\n", cpu, bootval);
+
/*
* Check whether this CPU is the first in a package to come up. In
* this case do not check the boot value against another package
@@ -282,17 +284,22 @@ void check_tsc_sync_source(int cpu)
else
atomic_set(&test_runs, 3);
retry:
+ pr_info("TSC source sync %d -> %d runs %d\n", smp_processor_id(),
+ cpu, atomic_read(&test_runs));
/*
* Wait for the target to start or to skip the test:
*/
while (atomic_read(&start_count) != cpus - 1) {
if (atomic_read(&skip_test) > 0) {
+ pr_info("TSC source sync skipped\n");
atomic_set(&skip_test, 0);
return;
}
cpu_relax();
}

+ pr_info("TSC source sync starting\n");
+
/*
* Trigger the target to continue into the measurement too:
*/
@@ -300,6 +307,7 @@ void check_tsc_sync_source(int cpu)

check_tsc_warp(loop_timeout(cpu));

+ pr_info("TSC source sync wait for stop\n");
while (atomic_read(&stop_count) != cpus-1)
cpu_relax();

@@ -311,7 +319,7 @@ void check_tsc_sync_source(int cpu)
if (!nr_warps) {
atomic_set(&test_runs, 0);

- pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+ pr_info("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
smp_processor_id(), cpu);

} else if (atomic_dec_and_test(&test_runs) || random_warps) {
@@ -367,6 +375,7 @@ void check_tsc_sync_target(void)
* successful skip the test.
*/
if (tsc_store_and_check_tsc_adjust()) {
+ pr_info("TSC target sync skip\n");
atomic_inc(&skip_test);
return;
}
@@ -376,6 +385,7 @@ void check_tsc_sync_target(void)
* Register this CPU's participation and wait for the
* source CPU to start the measurement:
*/
+ pr_info("TSC target sync starting\n");
atomic_inc(&start_count);
while (atomic_read(&start_count) != cpus)
cpu_relax();
@@ -392,6 +402,7 @@ void check_tsc_sync_target(void)
*/
atomic_inc(&stop_count);

+ pr_info("TSC target sync stop\n");
/*
* Wait for the source CPU to print stuff:
*/
@@ -403,6 +414,8 @@ void check_tsc_sync_target(void)
*/
atomic_set(&stop_count, 0);

+ pr_info("TSC target sync remaining runs %d\n", atomic_read(&test_runs));
+
/*
* Check the number of remaining test runs. If not zero, the test
* failed and a retry with adjusted TSC is possible. If zero the