Re: 2.6.12-rc2-mm3

From: Mikael Pettersson
Date: Mon Apr 18 2005 - 17:28:46 EST


On Mon, 18 Apr 2005 11:56:15 +0200, Alexander Nyberg wrote:
>> >This patch fixes the NMI checking problems in -mm x64 for me. It
>>
>> What problems?
>>
>
>Sorry, in -mm on x64 check_nmi_watchdog() has started to be run as a
>late_initcall(). Currently it reports the NMIs as stuck on a few systems
>although they are not, both of mine are reported as stuck. This appears
>to be because the current event mask uses don't appear to tick much
>running mdelay() on opteron (in my case).

Please provide a complete dmesg log up to and including the failure
point where the kernel complains about stuck NMIs.

I tried 2.6.12-rc2-mm3 SMP on UP amd64 and I immediately found a
bug triggering bogus stuck NMI failures, and I want to check if
what you're seeing is caused by the same bug.

> Also in -mm because nmi_hz is
>set to 1 in setup_k7_watchdog() the NMI watchdog checking takes 10
>seconds, a bit much.

Orthogonal issue. Let's ignore this one for now.

>Patch below uses RETIRED_UOPS for a more constant rate of NMI sending,

This may or may not work as you intend. There is _no_ documented reason
to assume that RETIRED_UOPS would provide a more steady stream of events
than CYCLES_PROCESSOR_IS_RUNNING. Both events are likely to be idle in
HLT states, for instance.

The local APIC + performance counter driven NMI watchdog simply cannot
provide wall-clock like behaviour. You need the I/O-APIC driven watchdog
for that, or to prevent the kernel from using HLT when idle.

>@@ -68,7 +69,7 @@
> #define K7_EVNTSEL_INT (1 << 20)
> #define K7_EVNTSEL_OS (1 << 17)
> #define K7_EVNTSEL_USR (1 << 16)
>-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0x76
>+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING 0xC1 /* Retired uops */
> #define K7_NMI_EVENT K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING

This is as bogus as "#define ONE 2". CYCLES_PROCESSOR_IS_RUNNING
_is_ event 0x76 (AMD renamed it recently, but that's irrelevant).
Using RETIRED_UOPS requires a new define, and a modification to
the K7_NMI_EVENT #define.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/