Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

From: Thomas Gleixner
Date: Sat Mar 22 2008 - 07:22:29 EST


On Fri, 21 Mar 2008, Thomas Gleixner wrote:
> >
> > | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0
>
> Ok. So the watchdog trigger is a false positive.
>
> Thinking more about it, it looks like Andi's change triggers some
> hidden bug in the combination of NO_HZ and add_timer_on(), where the
> CPU on which the timer is added is likely in a long idle sleep. I look
> into this tomorrow.

Ok. Here is what's happening:

CPU0 runs the watchdog timer and schedules it on CPU1.

With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
boot process there is probably no timer pending on CPU1, which means
the idle sleep is infinite.

Now some time later CPU1 gets woken by an interrupt/IPI and runs the
timer wheel. At this point the pm_timer which is the reference clock
has already wrapped around, so the watchdog thinks that there is a
huge time difference and marks the TSC unstable.

Aside of that watchdog issue this also affects the other users of
add_timer_on(): e.g. queue_delayed_work_on().

Can you please apply the patch below and verify it with Andi's
watchdog patch applied ?

Thanks,

tglx

---
include/linux/tick.h | 4 ++++
kernel/time/tick-sched.c | 30 ++++++++++++++++++++++++++++++
kernel/timer.c | 14 +++++++++++++-
3 files changed, 47 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/tick.h
===================================================================
--- linux-2.6.orig/include/linux/tick.h
+++ linux-2.6/include/linux/tick.h
@@ -111,6 +111,8 @@ extern void tick_nohz_update_jiffies(voi
extern ktime_t tick_nohz_get_sleep_length(void);
extern void tick_nohz_stop_idle(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern int tick_nohz_cpu_needs_wakeup(int cpu);
+extern void tick_nohz_rescan_timers_on(int cpu);
# else
static inline void tick_nohz_stop_sched_tick(void) { }
static inline void tick_nohz_restart_sched_tick(void) { }
@@ -123,6 +125,8 @@ static inline ktime_t tick_nohz_get_slee
}
static inline void tick_nohz_stop_idle(int cpu) { }
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
+static inline int tick_nohz_cpu_needs_wakeup(int cpu) { return 0; }
+static inline void tick_nohz_rescan_timers_on(int cpu) { }
# endif /* !NO_HZ */

#endif
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -183,6 +183,36 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
}

/**
+ * tick_nohz_cpu_needs_wakeup - check possible wakeup of cpu in add_timer_on()
+ *
+ * when add_timer_on() happens on a CPU which is in a long idle sleep,
+ * then we need to wake it up so the timer wheel gets reevaluated.
+ *
+ * Note: we use idle_cpu() which checks the idle state lockless, but
+ * we are ordered against the other cpu which might be on the way to
+ * idle by the timer base lock, which we hold.
+ */
+int tick_nohz_cpu_needs_wakeup(int cpu)
+{
+ return tick_nohz_enabled && idle_cpu(cpu) &&
+ (cpu != smp_processor_id());
+}
+
+/**
+ * tick_nohz_rescan_timers_on - reevaluate the idle sleep time of a CPU
+ *
+ * When a CPU is idle and a timer got added to this CPU timer wheel
+ * via add_timer_on() then we need to make sure that the CPU
+ * reevaluates the timer wheel. Otherwise the timer might be delayed
+ * for a real long time.
+ */
+void tick_nohz_rescan_timers_on(int cpu)
+{
+ if (tick_nohz_enabled && idle_cpu(cpu))
+ smp_send_reschedule(cpu);
+}
+
+/**
* tick_nohz_stop_sched_tick - stop the idle tick from the idle task
*
* When the next event is more than a tick into the future, stop the idle tick
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -445,15 +445,27 @@ void add_timer_on(struct timer_list *tim
{
struct tvec_base *base = per_cpu(tvec_bases, cpu);
unsigned long flags;
+ int wakeidle;

timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(&base->lock, flags);
timer_set_base(timer, base);
internal_add_timer(base, timer);
+ /*
+ * Check whether the other CPU is idle and needs to be
+ * triggered to reevaluate the timer wheel when nohz is
+ * active. We are protected against the other CPU fiddling
+ * with the timer by holding the timer base lock. This also
+ * makes sure that a CPU on the way to idle can not evaluate
+ * the timer wheel.
+ */
+ wakeidle = tick_nohz_cpu_needs_wakeup(cpu);
spin_unlock_irqrestore(&base->lock, flags);
-}

+ if (wakeidle)
+ tick_nohz_rescan_timers_on(cpu);
+}

/**
* mod_timer - modify a timer's timeout
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/