Re: [PATCH] clocksource: Scale the max retry number of watchdog read according to CPU numbers

From: Feng Tang
Date: Sun Jan 28 2024 - 06:45:12 EST


On Fri, Jan 26, 2024 at 11:28:36AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 26, 2024 at 11:19:50AM -0500, Waiman Long wrote:
[...]
> > > > I also suggest doing the adjustment at boot time, for example, using
> > > > an early_initcall(). That way the test code also sees the scaled value.
> > > I also thought about doing the adjustment once in early boot phase
> > > using num_possible_cpus(), but gave up as that parameters could be
> > > changed runtime using sysfs's module parameter interface, and cpu
> > > runtime hotplugging.
> > >
> > > Since the watchdog timer only happens (if not skipped) every 500 ms,
> > > how about doing the ilog2 math everytime, like below:
> > >
> > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > > index 1d42d4b17327..9104bdecf34e 100644
> > > --- a/include/linux/clocksource.h
> > > +++ b/include/linux/clocksource.h
> > > @@ -291,7 +291,7 @@ static inline void timer_probe(void) {}
> > > #define TIMER_ACPI_DECLARE(name, table_id, fn) \
> > > ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
> > > -extern ulong max_cswd_read_retries;
> > > +extern long max_cswd_read_retries;
> > > void clocksource_verify_percpu(struct clocksource *cs);
> > > #endif /* _LINUX_CLOCKSOURCE_H */
> > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > > index c108ed8a9804..867bb36e6dad 100644
> > > --- a/kernel/time/clocksource.c
> > > +++ b/kernel/time/clocksource.c
> > > @@ -208,8 +208,8 @@ void clocksource_mark_unstable(struct clocksource *cs)
> > > spin_unlock_irqrestore(&watchdog_lock, flags);
> > > }
> > > -ulong max_cswd_read_retries = 2;
> > > -module_param(max_cswd_read_retries, ulong, 0644);
> > > +long max_cswd_read_retries = -1;
> > > +module_param(max_cswd_read_retries, long, 0644);
> > > EXPORT_SYMBOL_GPL(max_cswd_read_retries);
> > > static int verify_n_cpus = 8;
> > > module_param(verify_n_cpus, int, 0644);
> > > @@ -225,8 +225,17 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow,
> > > unsigned int nretries;
> > > u64 wd_end, wd_end2, wd_delta;
> > > int64_t wd_delay, wd_seq_delay;
> > > + long max_retries = max_cswd_read_retries;
> > > +
> > > + if (max_cswd_read_retries <= 0) {
> > > + if (max_cswd_read_retries != -1)
> > > + pr_warn_once("max_cswd_read_retries has been set a invalid number: %d\n",
> > > + max_cswd_read_retries);
> > > - for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) {
> > > + max_retries = ilog2(num_online_cpus()) + 1;
> > > + }
> > > +
> > > + for (nretries = 0; nretries <= max_retries; nretries++) {
> > > local_irq_disable();
> > > *wdnow = watchdog->read(watchdog);
> > > *csnow = cs->read(cs);
> > > @@ -238,7 +247,7 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow,
> > > wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult,
> > > watchdog->shift);
> > > if (wd_delay <= WATCHDOG_MAX_SKEW) {
> > > - if (nretries > 1 || nretries >= max_cswd_read_retries) {
> > > + if (nretries > 1 || nretries >= max_retries) {
> > > pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
> > > smp_processor_id(), watchdog->name, nretries);
> > > }
> >
> > The max_cswd_read_retries value is also used in
> > kernel/time/clocksource-wdtest.c. You will have to apply similar logic to
> > clocksource-wdtest.c if it is not done once in early_init.
>
> Good point! If it is not done in an early_init(), could we please
> have a function for the common code?

Thanks Waiman for the catch! And sure, will add a new helper function
for that.

Thanks,
Feng

>
> Thanx, Paul