Re: [PATCH clocksource 1/3] clocksource: Reject bogus watchdog clocksource measurements

From: Thomas Gleixner
Date: Thu Nov 17 2022 - 16:59:17 EST


Paul!

On Mon, Nov 14 2022 at 15:28, Paul E. McKenney wrote:
>
> + /* Check for bogus measurements. */
> + wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> + if (wd_nsec < (wdi >> 2)) {
> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> + continue;
> + }
> + if (wd_nsec > (wdi << 2)) {
> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> + continue;
> + }

This is really getting ridiculous.

The clocksource watchdog is supposed to run periodically with period =
WATCHDOG_INTERVAL.

That periodic schedule depends on the clocksource which is monitored. If
the clocksource runs fast the period is shortened and if it runs slow is
prolonged.

Now you add checks:

1) If the period observed by the watchdog clocksource is less than 1/4
of the expected period, everything is fine.

2) If the period observed by the watchdog clocksource is greater than 4
times the expected period, everything is fine.

IOW, you are preventing detection of one class of problems which caused
us to implement the watchdog clocksource in the first place.

You are preventing it by making the watchdog decision circular dependent
on the clocksource it is supposed to watch. IOW, you put a fox in charge
of the henhouse. That's a really brilliant plan.

But what's worse is the constant stream of heuristics which make the
clocksource watchdog "work" under workloads which are simply impossible
to be handled by its current implementation.

If I look at the full set of them by now, I'm pretty sure that a real
TSC fail would not be noticed anymore because there are more exceptions
and excuses why a particular measurement it bogus or invalid or
whatever.

I didn't do a full analysis yet, but I have a hard time to convince
myself that - assumed we add this gem - the watchdog will be anything
else than a useless waste of CPU cycles as there is always one of the
heuristics declaring that everything is fine along with incomprehensible
messages in dmesg which will create more confusion to most people than
being helpful.

This is hunting us for 20+ years now and why do we still need this? I'm
pretty sure that farcebook does not run their server farms on 20 years
old silicon.

That means even the largest customers have not been able to convince the
CPU manufactures to fix this idiocy by now, right? Either that or they
did not even try because it's simpler to "fix" it in software.

That's the only two explanations I have for the constant stream of
voodoo logic. Both are just a proof for my claim that this industry just
"works" by chance.

Can we stop this pretty please and either come up with something
fundamentally different or just admit defeat and remove the whole thing?

Thanks,

tglx