[PATCH] Clocksource: Avoid misjudgment of clocksource

From: yanghui
Date: Tue Oct 19 2021 - 00:14:23 EST




在 2021/10/19 上午12:14, John Stultz 写道:
On Tue, Oct 12, 2021 at 1:06 AM brookxu <brookxu.cn@xxxxxxxxx> wrote:
John Stultz wrote on 2021/10/12 13:29:
On Mon, Oct 11, 2021 at 10:23 PM brookxu <brookxu.cn@xxxxxxxxx> wrote:
John Stultz wrote on 2021/10/12 12:52 下午:
On Sat, Oct 9, 2021 at 7:04 AM brookxu <brookxu.cn@xxxxxxxxx> wrote:
If we record the watchdog's start_time in clocksource_start_watchdog(), and then
when we verify cycles in clocksource_watchdog(), check whether the clocksource
watchdog is blocked. Due to MSB verification, if the blocked time is greater than
half of the watchdog timer max_cycles, then we can safely ignore the current
verification? Do you think this idea is okay?

I can't say I totally understand the idea. Maybe could you clarify with a patch?


Sorry, it looks almost as follows:

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b8a14d2..87f3b67 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -119,6 +119,7 @@
static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
static DEFINE_SPINLOCK(watchdog_lock);
static int watchdog_running;
+static unsigned long watchdog_start_time;
static atomic_t watchdog_reset_pending;

static inline void clocksource_watchdog_lock(unsigned long *flags)
@@ -356,6 +357,7 @@ static void clocksource_watchdog(struct timer_list *unused)
int next_cpu, reset_pending;
int64_t wd_nsec, cs_nsec;
struct clocksource *cs;
+ unsigned long max_jiffies;
u32 md;

spin_lock(&watchdog_lock);
@@ -402,6 +404,10 @@ static void clocksource_watchdog(struct timer_list *unused)
if (atomic_read(&watchdog_reset_pending))
continue;

+ max_jiffies = nsecs_to_jiffies(cs->max_idle_ns);
+ if (time_is_before_jiffies(watchdog_start_time + max_jiffies))
+ continue;
+

Sorry, what is the benefit of using jiffies here? Jiffies are
updated by counting the number of tick intervals on the current
clocksource.

This seems like circular logic, where we're trying to judge the
current clocksource by using something we derived from the current
clocksource.
That's why the watchdog clocksource is important, as it's supposed to
be a separate counter that is more reliable (but likely slower) then
the preferred clocksource.

So I'm not really sure how this helps.

The earlier patch by yanghui at least used the watchdog interval to
decide if the watchdog timer had expired late. Which seemed
reasonable, but I thought it might be helpful to add some sort of a
counter so if the case is happening repeatedly (timers constantly
being delayed) we have a better signal that the watchdog and current
clocksource are out of sync. Because again, timers are fired based on

I think only have a signal ls not enough. we need to prevent
clocksource from being incorrectly switched.

The Timer callback function clocksource_watchdog() is executed in the
context of softirq(run_timer_softirq()). So if softirq is disabled for
long time(One situation is long time softlockup), clocksource_watchdog()
will be delay executed.
the current clocksource. So constant delays likely mean things are
wrong.
> thanks
-john

thanks
-john

I think it will be better to add this to my patch:
/*
* Interval: 0.5sec.
- * MaxInterval: 1s.
+ * MaxInterval: 20s.
*/
#define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
+#define WATCHDOG_MAX_INTERVAL_NS (20 * NSEC_PER_SEC)

thanks