Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback

From: Andi Kleen
Date: Thu Sep 04 2008 - 14:50:10 EST


On Thu, Sep 04, 2008 at 02:26:37PM -0400, Don Zickus wrote:
> On Thu, Sep 04, 2008 at 07:52:31PM +0200, Andi Kleen wrote:
> > On Thu, Sep 04, 2008 at 01:20:52PM -0400, Don Zickus wrote:
> > > On Thu, Sep 04, 2008 at 05:52:17PM +0200, Andi Kleen wrote:
> > > > Then if there's a chipset specific NMI driver it could
> > > > also check if the chipset raised it. That would be a possible
> > > > solution for HP -- they would need to implement such a driver
> > > > for their systems with the special watchdog.
> > >
> > > The thing with HP's special watchdog timer is that it does _not_ have a
> > > chipset specific NMI it is trying to catch. HP is going on the assumption
> > > that _all_ NMIs are /bad/ and they want to catch _every_ NMI, log it, and
> > > reboot the system.
> >
> > That's my point. If you have drivers which can identify all other
> > NMIs then the left over NMIs must come from that watchdog driver.
> > So they just need drivers which can do that for their chipsets.
>
> Except their chipsets are _not_ producing NMIs. They just want to

They will produce NMIs when the suitable error conditions are true.
That is why the fallback is assuming a chipset problem.

So the only reliable way to find out if the event really came from
the misdesigned watchdog (whoever designed it didn't understand
NMIs I would say) you have to check the chipset (and all other
sources). Hopefully they are all better designed and can
actually tell you if they triggered or not.

Also there's the issue that sometimes people want
the fallback to be the NMI button.

> > It's not race free, but that's simply not possible with the x86
> > NMI architecture.
>
> I agree.
>
> >
> > Better would be probably to just configure the watchdog
> > to reboot the system directly on its own. Most other watchdogs
> > I'm aware of do that. That's more reliable anyways because the system
> > might be wedged enough to not be able to process NMIs anymore.
>
> The trick is they want to log it in a special way (BIOS or NVRAM or
> something I forget) before rebooting.

Then why tell the OS? It should be an SMI then.


-Andi
--
ak@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/