Re: [PATCH 0/2] clocksource: Avoid incorrect hpet fallback

From: Feng Tang
Date: Wed Nov 10 2021 - 20:53:38 EST


On Wed, Nov 10, 2021 at 08:30:10PM -0500, Waiman Long wrote:
>
> On 11/10/21 20:23, Feng Tang wrote:
> > Hi Waiman, Paul,
> >
> > On Wed, Nov 10, 2021 at 05:17:30PM -0500, Waiman Long wrote:
> > > It was found that when an x86 system was being stressed by running
> > > various different benchmark suites, the clocksource watchdog might
> > > occasionally mark TSC as unstable and fall back to hpet which will
> > > have a signficant impact on system performance.
> > We've seen similar cases while running 'netperf' and 'lockbus/ioport'
> > cases of 'stress-ng' tool.
> >
> > In those scenarios, the clocksource used by kernel is tsc, while
> > hpet is used as watchdog. And when the "screwing" happens, we found
> > mostly it's the hpet's 'fault', that when system is under extreme
> > pressure, the read of hpet could take a long time, and even 2
> > consecutive read of hpet will have a big gap (up to 1ms+) in between.
> > So the screw we saw is actually caused by hpet instead of tsc, as
> > tsc read is a lightweight cpu operation
> >
> > I tried the following patch to detect the screw of watchdog itself,
> > and avoid wrongly judging the tsc to be unstable. It does help in
> > our tests, please help to review.
> >
> > And one futher idea is to also adding 2 consecutive read of current
> > clocksource, and compare its gap with watchdog's, and skip the check
> > if the watchdog's is bigger.
>
> That is what I found too. And I also did a 2nd watchdog read to compare the
> consecutive delay versus half the threshold and skip the test if it exceeds
> it. My patch is actually similar in concept to what your patch does.

Aha, yes, I missed that.

I just got to office, and saw the disucssion around 0/2 patch and replied,
without going through the patches, sorry about that.

0day reported some cases about stress-ng testing, and we are still testing
differenct cases we've seen.

Thanks,
Feng

> Cheers,
> Longman