Re: [PATCH v2 clocksource 6/7] clocksource: Verify HPET and PMTMR when TSC unverified

From: Paul E. McKenney
Date: Wed Feb 01 2023 - 14:55:23 EST


On Wed, Feb 01, 2023 at 02:26:29PM -0500, Waiman Long wrote:
> On 2/1/23 05:24, Thomas Gleixner wrote:
> > Paul!
> >
> > On Tue, Jan 24 2023 at 16:27, Paul E. McKenney wrote:
> > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
> > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
> > > TSC is disabled. This works well much of the time, but there is the
> > > occasional production-level system that meets all of these criteria, but
> > > which still has a TSC that skews significantly from atomic-clock time.
> > > This is usually attributed to a firmware or hardware fault. Yes, the
> > > various NTP daemons do express their opinions of userspace-to-atomic-clock
> > > time skew, but they put them in various places, depending on the daemon
> > > and distro in question. It would therefore be good for the kernel to
> > > have some clue that there is a problem.
> > >
> > > The old behavior of marking the TSC unstable is a non-starter because a
> > > great many workloads simply cannot tolerate the overheads and latencies
> > > of the various non-TSC clocksources. In addition, NTP-corrected systems
> > > sometimes can tolerate significant kernel-space time skew as long as
> > > the userspace time sources are within epsilon of atomic-clock time.
> > >
> > > Therefore, when watchdog verification of TSC is disabled, enable it for
> > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel
> > > time-skew diagnostic without degrading the system's performance.
> > I'm more than unhappy about this. We finally have a point where the TSC
> > watchdog overhead can go away without adding TSC=reliable to the kernel
> > commandline.
> >
> > Now you add an unconditionally enforce the watchdog again in a way which
> > even cannot be disabled on the kernel command line.
> >
> > Patently bad idea, no cookies for you!
>
> I have a similar concern about this patch as well. That is why I was
> suggesting to have this enabled for a limited time after boot for sanity
> checking purpose only.

Fair enough!

If the watchdog checking of HPET and/or PMTMR against TSC only happens
only when the sysadm asks for it, would you still want to have the ability
to enable such watchdog checking at boot time, and then to disable it
once the system had been running for some limited time?

Thanx, Paul