[patch] x86_64: do not enable the NMI watchdog by default

From: Ingo Molnar
Date: Thu Dec 07 2006 - 07:15:14 EST



* Len Brown <len.brown@xxxxxxxxx> wrote:

> Personally I have never been a big fan of having the NMI watchdog
> running by default on all systems -- but Andi insists that it helps
> him debug failures, so tick it does...

enabling it by default was IMO a really bad decision and it needs to be
undone via the patch attached further below.

If Andi wants to debug stuff via the NMI wachdog, he should use the
nmi_watchdog=2 boot option: next to the tty=ttyS0 serial console
options, initcall_debug, apic=debug, earlyprintk and myriads of other
kernel-hackers-only boot options that we all use to 'help debug
failures' ...

also, lock debugging facilities catch lockup possibilities (and actual
lockups) alot more efficiently, i cannot remember the last time the NMI
watchdog caught anything on my boxes without some other facility not
triggering first. (and i have it enabled everywhere)

I have run a quick analysis over all locking related crashes i triggered
on one particular testbox over the past 2 weeks. Out of 26 separate lock
related incidents spread out equally over that timeframe (out of 497
bootups on this box), this was the distribution of debugging facilities
that caught the bugs:

1 BUG: spinlock lockup on
1 [ INFO: inconsistent lock state ]
2 BUG: scheduling while atomic
2 [ BUG: bad unlock balance detected! ]
2 BUG: sleeping function called from invalid context
6 BUG: scheduling with irqs disabled
5 [ INFO: hard-safe -> hard-unsafe lock order detected ]
7 BUG: using smp_processor_id() in preemptible [] code

8 were caught by lockdep, 8 by atomicity checks in the scheduler, 7 by
DEBUG_PREEMPT and 1 by DEBUG_SPINLOCK.

Note: zero were caught by the NMI watchdog, and i run the NMI watchdog
enabled by default on all architectures, and i have serial logging of
everything.

but even for the typical distro kernel and for the typical user, the NMI
watchdog is normally useless, because NMI lockups rarely make it into
the syslog and X just locks up without showing anything on the screen.

Ingo

----------------------->
Subject: [patch] x86_64: do not enable the NMI watchdog by default
From: Ingo Molnar <mingo@xxxxxxx>

do not enable the NMI watchdog by default. Now that we have lockdep i
cannot remember the last time it caught a real bug, but the NMI watchdog
can /cause/ problems. Furthermore, to the typical user, an NMI watchdog
assert results in a total lockup anyway (if under X). In that sense, all
that the NMI watchdog does is that it makes the system /less/ stable and
/less/ debuggable.

people can still enable it either after bootup via:

echo 1 > /proc/sys/kernel/nmi

or via the nmi_watchdog=1 or nmi_watchdog=2 boot options.

build and boot tested on an Athlon64 box.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
arch/x86_64/kernel/apic.c | 1 -
arch/x86_64/kernel/io_apic.c | 5 +----
arch/x86_64/kernel/nmi.c | 2 +-
arch/x86_64/kernel/smpboot.c | 1 -
include/asm-x86_64/nmi.h | 1 -
5 files changed, 2 insertions(+), 8 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -443,7 +443,6 @@ void __cpuinit setup_local_APIC (void)
oldvalue, value);
}

- nmi_watchdog_default();
setup_apic_nmi_watchdog(NULL);
apic_pm_activate();
}
Index: linux/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/io_apic.c
+++ linux/arch/x86_64/kernel/io_apic.c
@@ -1604,7 +1604,6 @@ static inline void check_timer(void)
*/
unmask_IO_APIC_irq(0);
if (!no_timer_check && timer_irq_works()) {
- nmi_watchdog_default();
if (nmi_watchdog == NMI_IO_APIC) {
disable_8259A_irq(0);
setup_nmi();
@@ -1630,10 +1629,8 @@ static inline void check_timer(void)
setup_ExtINT_IRQ0_pin(apic2, pin2, vector);
if (timer_irq_works()) {
apic_printk(APIC_VERBOSE," works.\n");
- nmi_watchdog_default();
- if (nmi_watchdog == NMI_IO_APIC) {
+ if (nmi_watchdog == NMI_IO_APIC)
setup_nmi();
- }
return;
}
/*
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -181,7 +181,7 @@ static __cpuinit inline int nmi_known_cp
}

/* Run after command line and cpu_init init, but before all other checks */
-void nmi_watchdog_default(void)
+static inline void nmi_watchdog_default(void)
{
if (nmi_watchdog != NMI_DEFAULT)
return;
Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -866,7 +866,6 @@ static int __init smp_sanity_check(unsig
*/
void __init smp_prepare_cpus(unsigned int max_cpus)
{
- nmi_watchdog_default();
current_cpu_data = boot_cpu_data;
current_thread_info()->cpu = 0; /* needed? */
set_cpu_sibling_map(0);
Index: linux/include/asm-x86_64/nmi.h
===================================================================
--- linux.orig/include/asm-x86_64/nmi.h
+++ linux/include/asm-x86_64/nmi.h
@@ -59,7 +59,6 @@ extern void disable_timer_nmi_watchdog(v
extern void enable_timer_nmi_watchdog(void);
extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);

-extern void nmi_watchdog_default(void);
extern int setup_nmi_watchdog(char *);

extern atomic_t nmi_active;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/