Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback

From: Vivek Goyal
Date: Thu Sep 04 2008 - 16:58:20 EST


On Thu, Sep 04, 2008 at 08:01:31PM +0000, Mingarelli, Thomas wrote:
> Exactly.
>
> The hpwdt driver is meant to be a catch-all for any NMI coming through on ProLiant HW only. Moreover, for newer ProLiant HW at that.
>
> Once the NMI comes in, we call into our BIOS for the true reason of the NMI. That message gets logged to the IML in NVRAM for the user to view. We then panic the system.
>
> Yes, kdump will work under this scenario because we stop the watchdog timer. This is a user configurable setting.
>
>

Sorry I did not get it. Few questions.

- So you want to capture every NMI and then do something. So what's the
harm in registering on die chain and look for both DIE_NMI_IPI and
DIE_NMI events and take appropriate action? Depending on reason code,
one or other will be called. If I read the code correctly, you will get
to see every NMI on that cpu irrespective of the reason and then you can
take the action accordingly.

- How would kdump continue to work above driver hijacks the nmi callback.
You will disable watchdog, log message and call panic(). panic() will
lead to kdump and kdump will send NMI IPI to reset of the cpus in the
system to save their state and halt these. The moment other cpus get
NMI IPI, above driver will hijack that NMI also and nobody gets a chance
to run? So kdump will not work?

Am I missing something?

Thanks
Vivek



> Tom
>
> -----Original Message-----
> From: Andi Kleen [mailto:andi@xxxxxxxxxxxxxx]
> Sent: Thursday, September 04, 2008 3:01 PM
> To: Vivek Goyal
> Cc: Don Zickus; Andi Kleen; Ingo Molnar; Prarit Bhargava; Peter Zijlstra; linux-kernel@xxxxxxxxxxxxxxx; arozansk@xxxxxxxxxx; Mingarelli, Thomas; ak@xxxxxxxxxxxxxxx; Alan Cox; H. Peter Anvin; Thomas Gleixner; Maciej W. Rozycki
> Subject: Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback
>
> > Add "kdump" to the list. It will also be broken if we decide to let one
> > driver hijack the NMI handler.
>
> kdump is a special case, similar to the NMI button panic mode. It should
> be always only active when the user configured it. When the user configured
> it should be always the fallback and override any other drivers.
>
> But watchdog is a special case. I assume the watchdog will just log
> (and do the work that a SMI should be doing) but then continue
> the chain so that kdump can dump on a watchdog timeout.
>
> -Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/