Re: [RFC][PATCH] nmi watchdog: handle NMI_IO_APIC on nmi_watchdog

From: Aristeu Rozanski
Date: Wed Mar 26 2008 - 14:44:48 EST


> > > stop_apic_nmi_watchdog() doesnt currently properly disable the
> > > generation of NMIs when they come from an IO-APIC, so this will need
> > > more fixes i believe. One approach would be to save the IO-APIC id and
> > > pin when the watchdog is set up, and use it later on to poke that
> > > IO-APIC register to disable NMI generation there.
> > the patch I sent has this change:
> >
> > @@ -270,6 +270,8 @@ void stop_apic_nmi_watchdog(void *unused
> > return;
> > if (nmi_watchdog == NMI_LOCAL_APIC)
> > lapic_watchdog_stop();
> > + else
> > + __acpi_nmi_disable(NULL);
> > __get_cpu_var(wd_enabled) = 0;
> > atomic_dec(&nmi_active);
> > }
> >
> > and:
> > static void __acpi_nmi_disable(void *__unused)
> > {
> > apic_write(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
> > }
> >
> > do you think this isn't enough?
>
> but this stops all NMIs, not just the IO-APIC generated ones, doesnt it?
This is the reverse of:

static void __init setup_nmi(void)
{
/*
* Dirty trick to enable the NMI watchdog ...
* We put the 8259A master into AEOI mode and
* unmask on all local APICs LVT0 as NMI.
*
* The idea to use the 8259A in AEOI mode ('8259A Virtual Wire')
* is from Maciej W. Rozycki - so we do not have to EOI from
* the NMI handler or the timer interrupt.
*/
apic_printk(APIC_VERBOSE, KERN_INFO "activating NMI Watchdog ...");

enable_NMI_through_LVT0();

apic_printk(APIC_VERBOSE, " done.\n");
}
where:
void __cpuinit enable_NMI_through_LVT0(void)
{
unsigned int v;

/* unmask and set to NMI */
v = APIC_DM_NMI;
apic_write(APIC_LVT0, v);
}

I must admit I don't really understand how the NMI watchdog thru IOAPIC
works. Couldn't find proper documentation on that. I just tried to
revert what's done when it's enabled and everything worked as expected,
including a customer who is using those development boxes with a NMI
button. First the NMI watchdog was disabled using a patch much like the
one I sent then NMI was generated by pushing the button and the box
crashed as it should (the old unknown_nmi_panic sysctl was set). So,
unless the NMI button does something different to deliver the NMI, other
NMIs should keep working.

--
Aristeu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/