Re: 2.6.38-rc2: Uhhuh. NMI received for unknown reason 2d on CPU 0.

From: George Spelvin
Date: Wed Feb 16 2011 - 06:57:19 EST


> Ping on this problem, still seeing
>
> Uhhuh. NMI received for unknown reason 3c on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
>
> on my Pentium-D system here with latest Linus head.
>
> its sometimes 3c, sometimes 3d, I'm going to bisect and push for
> reverts if nobody still has any clue about how to fix this.

The second patch (not the one you quote) fixed it for me. Almost 8 days
of uptime and no log spam.

It's appended below for your convenience. Are you using this
unsuccessfully?


From: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Subject: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test

A couple of people have reported an unknown NMI issue on p4 pmu.
This patch should fix it.

Reported-by: George Spelvin <linux@xxxxxxxxxxx>
Reported-by: Meelis Roos <mroos@xxxxxxxx>
Reported-by: Don Zickus <dzickus@xxxxxxxxxx>
Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
CC: Ingo Molnar <mingo@xxxxxxx>
CC: Lin Ming <ming.m.lin@xxxxxxxxx>
CC: Don Zickus <dzickus@xxxxxxxxxx>
CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
---
arch/x86/include/asm/perf_event_p4.h | 1 +
arch/x86/kernel/cpu/perf_event_p4.c | 11 ++++++++---
2 files changed, 9 insertions(+), 3 deletions(-)

Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
===================================================================
--- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
@@ -22,6 +22,7 @@

#define ARCH_P4_CNTRVAL_BITS (40)
#define ARCH_P4_CNTRVAL_MASK ((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
+#define ARCH_P4_UNFLAGGED_BIT ((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))

#define P4_ESCR_EVENT_MASK 0x7e000000U
#define P4_ESCR_EVENT_SHIFT 25
Index: linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
===================================================================
--- linux-2.6.tip.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
@@ -770,9 +770,14 @@ static inline int p4_pmu_clear_cccr_ovf(
return 1;
}

- /* it might be unflagged overflow */
- rdmsrl(hwc->event_base + hwc->idx, v);
- if (!(v & ARCH_P4_CNTRVAL_MASK))
+ /*
+ * at some circumstances the overflow might issue NMI but did
+ * not set P4_CCCR_OVF bit so since a counter holds a negative value
+ * we simply check for high bit being set, if it's cleared it means
+ * the counter has reached zero value and continued counting before
+ * real NMI signal was received
+ */
+ if (!(v & ARCH_P4_UNFLAGGED_BIT))
return 1;

return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/