Re: [PATCH] clocksource: Improve cs_watchdog_read()

From: Paul E. McKenney
Date: Wed Nov 10 2021 - 10:05:10 EST


On Wed, Nov 10, 2021 at 02:53:27PM +0100, Heiner Kallweit wrote:
> On 10.11.2021 13:48, Paul E. McKenney wrote:
> > On Tue, Nov 09, 2021 at 09:55:08PM +0100, Heiner Kallweit wrote:
> >> If max_cswd_read_retries is set to 0 or 1 then the current warning
> >> behavior doesn't seem to make too much sense to me.
> >> If set to 0, then we'd warn with each watchdog run.
> >> If set to 1, then we'd warn at the first retry, even though the commit
> >> description of db3a34e17433 states that one retry is expected behavior.
> >> If printing a message at all in this case, then it should be debug
> >> level.
> >
> > The behavior for max_cswd_read_retries==1 is exactly what you want when
> > you are checking to see whether or not your system would retry at all
> > for the duration of a given run.
> >
> > The behavior for max_cswd_read_retries==0 is exactly what you want when
> > you are testing the ability to print that message on a system that will
> > not do a retry in a reasonable period of time.
> >
> > Or am I missing something here?
> >
> For me this mixes production warning and debug features.

I know of no law saying that a given module parameter cannot be used for
both debugging and production features. In fact, there are any number
of parameters that take ranges of values, some of which should not be
used in production. In addition, Linux supports such a wide range of
environments and workloads that people supporting different production
environments will have very different ideas about the advisability of
setting a given value for a given module parameter.

So if your production environments don't like setting the value of
clocksource.max_cswd_read_retries to less than two, why not just prohibit
such settings in your production environments?

> To support your debug use cases, maybe use something like this?
>
> if (nretries > 1)
> pr_warn()
> else if (nretries >= max_cswd_read_retries)
> pr_debug()

I am sorry, but I am not seeing the benefit of this change. How does
this help you?

Thanx, Paul

> >> Whilst being at it, move declaration of wd_end and wd_delta into the
> >> loop and remove not needed braces.
> >
> > I am OK with moving those two variables into the "for" loop.
> >
> > I am personally OK removing the braces, but if I remember correctly,
> > my upstream maintainer asked that I add them due to the statement being
> > split across two lines.
> >
> > Thanx, Paul
> >
> >> Signed-off-by: Heiner Kallweit <hkallweit1@xxxxxxxxx>
> >> ---
> >> kernel/time/clocksource.c | 6 +++---
> >> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> >> index f29d1a524..8c0be9c02 100644
> >> --- a/kernel/time/clocksource.c
> >> +++ b/kernel/time/clocksource.c
> >> @@ -208,10 +208,11 @@ module_param(verify_n_cpus, int, 0644);
> >> static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> >> {
> >> unsigned int nretries;
> >> - u64 wd_end, wd_delta;
> >> int64_t wd_delay;
> >>
> >> for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) {
> >> + u64 wd_end, wd_delta;
> >> +
> >> local_irq_disable();
> >> *wdnow = watchdog->read(watchdog);
> >> *csnow = cs->read(cs);
> >> @@ -222,10 +223,9 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> >> wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult,
> >> watchdog->shift);
> >> if (wd_delay <= WATCHDOG_MAX_SKEW) {
> >> - if (nretries > 1 || nretries >= max_cswd_read_retries) {
> >> + if (nretries > 1)
> >> pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
> >> smp_processor_id(), watchdog->name, nretries);
> >> - }
> >> return true;
> >> }
> >> }
> >> --
> >> 2.33.1
> >>
>