[PATCH v2] clocksource: disable watchdog checks on TSC when TSC is watchdog

From: Jiri Wiesner
Date: Wed Jan 24 2024 - 09:54:39 EST


Change the semantics of the tsc=watchdog option to always remove the
CLOCK_SOURCE_MUST_VERIFY bit from the TSC clocksource so as to provide an
alternative to the tsc=nowatchdog option, which is used routinely to
prevent switches to the HPET clocksource on machines with hardware issues.

Currently, the TSC does not get checked on machines with a stable TSC (4
NUMA nodes or less, CONSTANT_TSC, NONSTOP_TSC and TSC_ADJUST), and the
tsc=watchdog option allows one to have the PMTMR (acpi_pm) checked
instead. There are production machines that do not quality for having the
clocksource watchdog disabled (usually on account of having more than 4
NUMA nodes) and that also may have a malfunctioning CPU that causes a
watchdog check failure in a few days after a reboot resulting in the
current clocksource being switched to the HPET. Solving a hardware issue
or replacing the machine may take an inordinate amount of time, while the
production machine is still needed to do its job.

It is possible to pass tsc=nowatchdog to the kernel, which disables the
clocksource watchdog on the TSC, effectively leaving the operator of the
production machine blind to any clocksource malfunction. The sematics of
the recently introduced tsc=watchdog option, 0051293c5330 ("clocksource:
Enable TSC watchdog checking of HPET and PMTMR only when requested"), is
changed by this patch so that the TSC does not get checked when
tsc=watchdog is specified regardless of the TSC passing or not passing the
criteria for being judged to be stable. This way, the TSC acts as a
watchdog checking other clocksources (HPET, PMTMR) and a failed check
does not result in the current clocksource (TSC) getting marked unstable.
Passing tsc=watchdog has the advantange of the operator being informed
that the machine may be experiencing clocksource issues.

There is one side effect of the semantic change of tsc=watchdog - it fixes
the HPET not having its CLOCK_SOURCE_MUST_VERIFY bit set as intended by
efc8b329c7fd ("clocksource: Verify HPET and PMTMR when TSC unverified").
When the system is booting up, the HPET does not have its
CLOCK_SOURCE_MUST_VERIFY bit set on account of the order in which
clocksources are initialized in x86_late_time_init(). The HPET
initialization and clocksource registration always comes before the TSC
init function unsets the CLOCK_SOURCE_MUST_VERIFY bit.

Fixes: efc8b329c7fd ("clocksource: Verify HPET and PMTMR when TSC unverified")
Signed-off-by: Jiri Wiesner <jwiesner@xxxxxxx>
---
v2: the changelog was updated to improve intelligibility

Documentation/admin-guide/kernel-parameters.txt | 9 +++++----
arch/x86/include/asm/time.h | 2 +-
arch/x86/kernel/hpet.c | 2 +-
arch/x86/kernel/tsc.c | 7 +++----
drivers/clocksource/acpi_pm.c | 2 +-
5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 31b3a25680d0..860896571c04 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6722,10 +6722,11 @@
obtained from HW or FW using either an MSR or CPUID(0x15).
Warn if the difference is more than 500 ppm.
[x86] watchdog: Use TSC as the watchdog clocksource with
- which to check other HW timers (HPET or PM timer), but
- only on systems where TSC has been deemed trustworthy.
- This will be suppressed by an earlier tsc=nowatchdog and
- can be overridden by a later tsc=nowatchdog. A console
+ which to check other HW timers (HPET or PM timer).
+ TSC is not checked by the watchdog, even on systems
+ where TSC has not been deemed trustworthy. This will be
+ suppressed by an earlier tsc=nowatchdog and can be
+ overridden by a later tsc=nowatchdog. A console
message will flag any such suppression or overriding.

tsc_early_khz= [X86] Skip early TSC calibration and use the given
diff --git a/arch/x86/include/asm/time.h b/arch/x86/include/asm/time.h
index f360104ed172..c2364b74a318 100644
--- a/arch/x86/include/asm/time.h
+++ b/arch/x86/include/asm/time.h
@@ -7,7 +7,7 @@

extern void hpet_time_init(void);
extern bool pit_timer_init(void);
-extern bool tsc_clocksource_watchdog_disabled(void);
+extern bool tsc_clocksource_as_watchdog(void);

extern struct clock_event_device *global_clock_event;

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index a38d0c93a66e..46f2b4ffdba7 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -1091,7 +1091,7 @@ int __init hpet_enable(void)
if (!hpet_counting())
goto out_nohpet;

- if (tsc_clocksource_watchdog_disabled())
+ if (tsc_clocksource_as_watchdog())
clocksource_hpet.flags |= CLOCK_SOURCE_MUST_VERIFY;
clocksource_register_hz(&clocksource_hpet, (u32)hpet_freq);

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 15f97c0abc9d..ec1860178ea1 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1221,10 +1221,9 @@ static void __init tsc_disable_clocksource_watchdog(void)
clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
}

-bool tsc_clocksource_watchdog_disabled(void)
+bool tsc_clocksource_as_watchdog(void)
{
- return !(clocksource_tsc.flags & CLOCK_SOURCE_MUST_VERIFY) &&
- tsc_as_watchdog && !no_tsc_watchdog;
+ return tsc_as_watchdog && !no_tsc_watchdog;
}

static void __init check_system_tsc_reliable(void)
@@ -1609,7 +1608,7 @@ void __init tsc_init(void)
return;
}

- if (tsc_clocksource_reliable || no_tsc_watchdog)
+ if (tsc_clocksource_reliable || no_tsc_watchdog || tsc_as_watchdog)
tsc_disable_clocksource_watchdog();

clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 82338773602c..9b5dad94713e 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -211,7 +211,7 @@ static int __init init_acpi_pm_clocksource(void)
return -ENODEV;
}

- if (tsc_clocksource_watchdog_disabled())
+ if (tsc_clocksource_as_watchdog())
clocksource_acpi_pm.flags |= CLOCK_SOURCE_MUST_VERIFY;
return clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC);
}
--
2.35.3


--
Jiri Wiesner
SUSE Labs