Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback

From: Ingo Molnar
Date: Fri Sep 05 2008 - 04:58:15 EST



* Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:

> > Prarit's patch disabled the timer upon registering a callback to
> > prevent this case. The thought was if you have your own handler you
> > could provide your own watchdog.
>
> In theory, you could do the same while registering the handler on die
> chain?
>
> I don't get the whole point. So we are looking for a system where no
> body else uses an NMI for any purpose and the moment NMI happens, this
> driver will go and panic() the system. I don't get, what do we achive
> by that?
>
> Looks like you got some device in platform which raises an NMI and
> which indicates that something is wrong and log the message and do a
> panic(). But you don't have a way to find out if that device has
> raised the interrupt or something else has raised the NMI, hence you
> want to stop the watchdog timer and assume any NMI henceforth is from
> device?
>
> So any functionality which is dependent on NMI, will not work as long
> as this driver is loaded.

exactly. The proposed patch brings the NMI notifier arms race to its
next level: it is in essence a "super high priority" notifier that wants
exclusivity. That is unnecessary: we already have sufficient
infrastructure in place to recognize the priority of notifiers, and die
notifiers make active use of it. (see the list below)

If the goal is to log relevant NMI events to NVRAM for later inspection
then the solution is to register a low-priority notifier which will
execute if no other notifier shows interest in an NMI event.

If the goal is to reboot the system if there's a "bad" NMI, then the
solution is to register a low-priority notifier which will execute and
panic the system if no other notifier shows interest in that NMI event.

This means all the 'good' NMI sources need to register at higher
priority and all the standard platform fallback notifiers need to
register at lowest priority. Debuggers go last. If any of the existing
notifiers in the kernel dont follow the rules and cause problems then we
can fix up those.

Here's a list of all current die notifiers we use in the kernel:

prio
----
# core kernel:

kernel/kprobes.c # instruction probing +INT_MAX
kernel/trace/trace.c # dump trace 150

# arch/x86:

arch/x86/mm/kmmio.c # mmiotrace, debug feature 0
arch/x86/kernel/kgdb.c # kgdb, kernel debugger -INT_MAX
arch/x86/kernel/crash.c # kdump 0
arch/x86/oprofile/nmi_int.c # oprofile 0
arch/x86/oprofile/nmi_timer_int.c # oprofile timer fallback 0

# drivers:

drivers/watchdog/hpwdt.c # HP watchdog driver +INT_MAX
drivers/char/ipmi/ipmi_watchdog.c # IPMI watchdog 150
drivers/misc/sgi-xp/xpc_main.c # SGI SN2 0

# upcoming:
arch/x86/mm/kmemcheck/smp.c # in tip/master 0
[ this should change to +INT_MAX ]

One change we could do is to move arch/x86/kernel/crash.c's priority to
-100. That would achieve the following effect: it would execute after
all known sources of NMIs, but before the lowest prio interactive
notifiers such as kgdb.

That way a platform driver could insert at priority -50 and execute
after the known self-generated NMI sources, but before any of the debug
and crash-tool notifiers. And the constants should be made symbolic as
well, so that we encode the purpose of a notifier, not the
implementation - and can map the purpose to priority and shape the
ordering flexibly.

furthermore, as i indicated it in my first reply, if the port 61H read
is causing problems on this platform (and i'd not be surprised if it did
so) we can turn it into a lowest-prio fallback notifier which only
executes if no known sources of NMIs show interest in an event.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/