Re: [PATCH 0/2] clocksource: Avoid incorrect hpet fallback

From: Waiman Long
Date: Wed Nov 10 2021 - 20:20:08 EST



On 11/10/21 19:04, Paul E. McKenney wrote:
On Wed, Nov 10, 2021 at 06:25:14PM -0500, Waiman Long wrote:
On 11/10/21 17:32, Paul E. McKenney wrote:
On Wed, Nov 10, 2021 at 05:17:30PM -0500, Waiman Long wrote:
It was found that when an x86 system was being stressed by running
various different benchmark suites, the clocksource watchdog might
occasionally mark TSC as unstable and fall back to hpet which will
have a signficant impact on system performance.

The current watchdog clocksource skew threshold of 50us is found to be
insufficient. So it is changed back to 100us before commit 2e27e793e280
("clocksource: Reduce clocksource-skew threshold") in patch 1. Patch 2
adds a Kconfig option to allow kernel builder to control the actual
threshold to be used.

Waiman Long (2):
clocksource: Avoid accidental unstable marking of clocksources
clocksource: Add a Kconfig option for WATCHDOG_MAX_SKEW
The ability to control the fine-grained threshold seems useful, but is
the TSC still marked unstable when this commit from -rcu is applied?
It has passed significant testing on other workloads.

2a43fb0479aa ("clocksource: Forgive repeated long-latency watchdog clocksource reads")

If the patch below takes care of your situation, my thought is to
also take your second patch, which would allow people to set the
cutoff more loosely or more tightly, as their situation dictates.

Thoughts?
That is commit 14dbb29eda51 ("clocksource: Forgive repeated long-latency
watchdog clocksource reads") in your linux-rcu git tree. From reading the
patch, I believe it should be able to address the hpet fallback problem that
Red Hat had encountered. Your patch said it was an out-of-tree patch. Are
you planning to mainline it?
Yes, I expect to submit it into the next merge window (not the current
v5.16 merge window, but v5.17). However, if your situation is urgent, and
if it works for you, I could submit it as a fix for an earlier regression.

I will build a test kernel based on your patch and ask our benchmarking group to run their test suites. It will take a day or two to get a definitive answer even though I believe it should fix the issue.

Cheers,
Longman