Re: [PATCH 0/2] clocksource: Avoid incorrect hpet fallback

From: Waiman Long
Date: Wed Nov 10 2021 - 20:30:20 EST



On 11/10/21 20:23, Feng Tang wrote:
Hi Waiman, Paul,

On Wed, Nov 10, 2021 at 05:17:30PM -0500, Waiman Long wrote:
It was found that when an x86 system was being stressed by running
various different benchmark suites, the clocksource watchdog might
occasionally mark TSC as unstable and fall back to hpet which will
have a signficant impact on system performance.
We've seen similar cases while running 'netperf' and 'lockbus/ioport'
cases of 'stress-ng' tool.

In those scenarios, the clocksource used by kernel is tsc, while
hpet is used as watchdog. And when the "screwing" happens, we found
mostly it's the hpet's 'fault', that when system is under extreme
pressure, the read of hpet could take a long time, and even 2
consecutive read of hpet will have a big gap (up to 1ms+) in between.
So the screw we saw is actually caused by hpet instead of tsc, as
tsc read is a lightweight cpu operation

I tried the following patch to detect the screw of watchdog itself,
and avoid wrongly judging the tsc to be unstable. It does help in
our tests, please help to review.

And one futher idea is to also adding 2 consecutive read of current
clocksource, and compare its gap with watchdog's, and skip the check
if the watchdog's is bigger.

That is what I found too. And I also did a 2nd watchdog read to compare the consecutive delay versus half the threshold and skip the test if it exceeds it. My patch is actually similar in concept to what your patch does.

Cheers,
Longman