Re: [PATCH v10 clocksource 6/7] clocksource: Forgive tsc_early pre-calibration drift

From: Thomas Gleixner
Date: Tue Apr 27 2021 - 17:03:48 EST


On Mon, Apr 26 2021 at 23:01, Feng Tang wrote:
> On Sun, Apr 25, 2021 at 03:47:07PM -0700, Paul E. McKenney wrote:
> We've reported one case that tsc can be wrongly judged as 'unstable'
> by 'refined-jiffies' watchdog [1], while reducing the threshold could
> make it easier to be triggered.
>
> It could be reproduced on the a plaform with a 115200 serial console,
> and hpet been disabled (several x86 platforms has this), add
> 'initcall_debug' cmdline parameter to get more debug message, we can
> see:
>
> [ 1.134197] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc-early' as unstable because the skew is too large:
> [ 1.134214] clocksource: 'refined-jiffies' wd_nesc: 500000000 wd_now: ffff8b35 wd_last: ffff8b03 mask: ffffffff

refined-jiffies is the worst of all watchdogs and this obviously cannot
be fixed at all simply because we can lose ticks in that mode. And no,
we cannot compensate for lost ticks via TSC which we in turn "monitor"
via ticks.

Even if we hack around it and make it "work" then the TSC will never
become fully trusted because refined-jiffies cannot support NOHZ/HIGHRES
mode for obvious reasons either. So the system stays in periodic mode
forever.

If there is no independent timer to validate against then the TSC is
better stable and the BIOS/SMM code has to be trusted not to wreckage
TSC. There are no other options.

So TBH, I do not care about this case at all. It's pointless to even
think about it. Either the TSC works on these systems or it doesn't. If
it doesn't, then you have to keep the pieces.

I'm so dead tired of this especially since this is known forever. But
it's obviously better to waste our time than to fix the damned hardware.

Thanks,

tglx