Re: [PATCH] clocksource: Print clocksource name when clocksource is tested unstable

From: Feng Tang
Date: Thu Nov 17 2022 - 03:39:05 EST


On Thu, Nov 17, 2022 at 04:03:07PM +0800, Sun, Yunying wrote:
> Hi Feng,
>
> For the "tsc fall back to hpet" issue, thanks a lot for pointing me to the fix proposal patch earlier:
> https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@xxxxxxxxx/ .
>
> I shared this patch set to our customer yesterday.
> Just got feedback that customer has verified the "tsc fall back to hpet" issue does not appear anymore with this patch set, after 24hrs stress test on 8 servers.

Thanks for the testing! So your case is a real case that current
'nr_online_nodes' check cannot cover, which Dave concerned about.

Thanks,
Feng

> -Yunying
>
> -----Original Message-----
> From: Tang, Feng <feng.tang@xxxxxxxxx>
> Sent: Thursday, 17 November, 2022 14:47
> To: Sun, Yunying <yunying.sun@xxxxxxxxx>
> Cc: tglx@xxxxxxxxxxxxx; peterz@xxxxxxxxxxxxx; paulmck@xxxxxxxxxx; longman@xxxxxxxxxx; x86@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] clocksource: Print clocksource name when clocksource is tested unstable
>
> On Wed, Nov 16, 2022 at 04:22:21PM +0800, Yunying Sun wrote:
> > Some "TSC fall back to HPET" cases are seen on systems that have more
> > than 2 numa nodes. When this happens, in kernel log it has:
>
> Hi Yunying,
>
> If the system's physical sockests number <= 2, I would suggest you to try the patch https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@xxxxxxxxx/,
> which is still under review and discussion.
>
> Thanks,
> Feng
>
> > clocksource: timekeeping watchdog on CPU168: hpet read-back delay of
> > 4296200ns, attempt 4, marking unstable
> >
> > The "hpet" here is misleading since it prints only the name of
> > watchdog, where actually it's measuring the delay of 3 reads: wd-clocksource-wd.
> >
> > Signed-off-by: Yunying Sun <yunying.sun@xxxxxxxxx>
> > ---
> > kernel/time/clocksource.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 8058bec87ace..fac8c0d90e61 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow,
> > goto skip_test;
> > }
> >
> > - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
> > - smp_processor_id(), watchdog->name, wd_delay, nretries);
> > + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n",
> > + smp_processor_id(), cs->name, wd_delay, nretries);
> > return WD_READ_UNSTABLE;
> >
> > skip_test:
> > --
> > 2.17.0
> >