Re: [PATCH v10 clocksource 1/7] clocksource: Provide module parameters to inject delays in watchdog

From: Paul E. McKenney
Date: Wed Apr 28 2021 - 09:57:33 EST


On Wed, Apr 28, 2021 at 12:49:12PM +0800, Luming Yu wrote:
> We 'd expect to see clock_source watchdog can avoid to do wrong thing
> due to the injected delay or
> in real life delay by doing tsc sync-re-check by applying the patch-set.
> However , the noise is still cause wrong actions and the patch doesn't
> defeat the injected's delay
> please correct me if I'm wrong.

Injecting delay is just a test. In real life, if you got four delays
in a row, the cause is likely that the clock read is broken and taking
a very long time. In which case marking that clock unstable is a
reasonable response.

Other causes include having an NMI or SMI storm, getting extremely
unlucky with vCPU preemptions, and so on. In these cases, you are not
making much forward progress anyway, so marked-unstable clock is the
least of your worries.

I ran this (without injected delays) for more than a thousand hours on
rcutorture guest OSes and didn't see any instances of even two consecutive
bad reads. There was the very occasional single instance of a bad read.

Therefore, the code marks the clock unstable if it has four bad reads
in a row, as it should.

Thanx, Paul

> parameters]# cat *
> 1
> 1
> -1
> 3
> 8
>
> [62939.809615] clocksource: clocksource_watchdog_inject_delay():
> Injecting delay.
> [62939.816867] clocksource: clocksource_watchdog_inject_delay():
> Injecting delay.
> [62939.824094] clocksource: clocksource_watchdog_inject_delay():
> Injecting delay.
> [62939.831314] clocksource: clocksource_watchdog_inject_delay():
> Injecting delay.
> [62939.838536] clocksource: timekeeping watchdog on CPU26: hpet
> read-back delay of 7220833ns, attempt 4, marking unstable
> [62939.849230] tsc: Marking TSC unstable due to clocksource watchdog
> [62939.855340] TSC found unstable after boot, most likely due to
> broken BIOS. Use 'tsc=unstable'.
> [62939.863972] sched_clock: Marking unstable (62943398530130,
> -3543150114)<-(62941186607503, -1331276112)
> [62939.875104] clocksource: Checking clocksource tsc synchronization
> from CPU 123 to CPUs 0,6,26,62,78,97-98,137.
> [62939.886518] clocksource: Switched to clocksource hpet
>
> On Tue, Apr 27, 2021 at 2:27 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Mon, Apr 26, 2021 at 10:56:27AM -0700, Andi Kleen wrote:
> > > > ------------------------------------------------------------------------
> > > >
> > > > - module parameters
> > > >
> > > > If the scope of the fault injection capability is limited to a
> > > > single kernel module, it is better to provide module parameters to
> > > > configure the fault attributes.
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > And in this case, the fault injection capability is in fact limited to
> > > > kernel/clocksource.c.
> > >
> > >
> > > I disagree with this recommendation because it prevents fuzzer coverage.
> > >
> > > Much better to have an uniform interface that can be automatically
> > > explored.
> >
> > The permissions for these module parameters is 0644, so there is no
> > reason why the fuzzers cannot use them via sysfs.
> >
> > Thanx, Paul