Re: [PATCH] watchdog: Inject NMI when locked up and going to panic

From: Andrew Morton
Date: Mon Nov 19 2012 - 19:19:08 EST


On Sat, 17 Nov 2012 19:28:53 -0500
Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:

> Send an NMI to all CPUs when a lockup is detected and the lockup
> watchdog code is configured to panic. This gives us a fairly uptodate
> snapshot of all CPUs in the system.
>
> This lets us get stack trace of all CPUs which makes life easier
> trying to debug a deadlock, and the NMI doesn't change anything
> since the next step is a kernel panic.
>

nit: I'll rename this to "watchdog: trigger all-cpu backtrace when
locked up and going to panic". We don't know how the arch implements
trigger_all_cpu_backtrace() at this level!


> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -239,10 +239,12 @@ static void watchdog_overflow_callback(struct perf_event *event,
> if (__this_cpu_read(hard_watchdog_warn) == true)
> return;
>
> - if (hardlockup_panic)
> + if (hardlockup_panic) {
> + trigger_all_cpu_backtrace();
> panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> - else
> + } else {
> WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> + }
>
> __this_cpu_write(hard_watchdog_warn, true);
> return;
> @@ -323,8 +325,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> else
> dump_stack();
>
> - if (softlockup_panic)
> + if (softlockup_panic) {
> + trigger_all_cpu_backtrace();
> panic("softlockup: hung tasks");
> + }
> __this_cpu_write(soft_watchdog_warn, true);
> } else
> __this_cpu_write(soft_watchdog_warn, false);

The change seems sensible, but I wonder about CONFIG_SMP=n machines.
Will they end up getting the same backtrace displayed twice?

(I don't remember whether trigger_all_cpu_backtrace() is really
trigger_all_other_cpu_backtrace() and we didn't document it).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/