Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms

From: Paul E. McKenney
Date: Wed Dec 01 2021 - 12:53:02 EST


On Wed, Dec 01, 2021 at 09:26:55AM +0800, Feng Tang wrote:
> On Tue, Nov 30, 2021 at 03:37:26PM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 01, 2021 at 12:19:43AM +0100, Thomas Gleixner wrote:
> > > On Tue, Nov 30 2021 at 14:48, Paul E. McKenney wrote:
> > > > On Tue, Nov 30, 2021 at 10:55:45PM +0100, Thomas Gleixner wrote:
> > > >> > OK, HPET or nothing, then.
> > > >>
> > > >> Older machines also have pm_timer. But those beasts seem to have lost
> > > >> that too.
> > > >
> > > > I suppose that one way of avoiding clock-skew messages is to have only
> > > > one clock.
> > >
> > > Indeed. It's a complete mystery why it takes ages to implement reliable
> > > clocks in hardware.
> >
> > That one is easy. It is because the previous clocksource watchdog was
> > too lenient. ;-)
> >
> > (Sorry, couldn't resist...)
> >
> > > >> >> We really need to remove the watchdog requirement for modern hardware.
> > > >> >> Let me stare at those patches and get them merged.
> > > >> >
> > > >> > You are more trusting of modern hardware than I am, but for all I know,
> > > >> > maybe rightfully so. ;-)
> > > >>
> > > >> Well, I rather put a bet on the hardware, which has become reasonable
> > > >> over the last decade, than on trying to solve a circular dependency
> > > >> problem with tons of heuristics which won't ever work correctly.
> > > >
> > > > Use of HPET to check the interval length would not be circular, right?
> > >
> > > As long as the HPET works reliably :)
> >
> > Is it also a complete mystery why clocksources previously deemed
> > reliable no longer work reliably? ;-)
>
> For HPET, it's a long story :) Back in 2012 or so, the HPET on Baytrail
> platform has a new feature that it will stop counting in PC10 (a cpuidle
> state), which prevent it to be a clocksource, and we have to disable
> HPET explicitly for that platform. Since then, some new platforms also
> have the same feature, and their HPET got disabled too.

I must confess that I have been involved in similar things more times
than I care to admit. ;-)

So the upshot is that if HPET does not work, it should be disabled,
in which case the clocksource watchdog will be ignoring it. In cases
where HPET can stop counting and is not disabled in the Linux kernel,
that is a bug that needs to be fixed by disabing HPET for those cases.

If TSC is the only clocksource, then the clocksource watchdog won't be
checking it.

All of this points to using the presumed-good clocksource to measure the
time between clocksource-watchdog checks, but excluding any silly cases.
For example, as Thomas Gleixner suggested, if the jiffies counter is
trying to be the presumed-good clocksource.

Thanx, Paul