Re: 2.6.38-rc2: Uhhuh. NMI received for unknown reason 2d on CPU 0.

From: Dave Airlie
Date: Wed Feb 16 2011 - 21:56:10 EST


>
> It's appended below for your convenience.  Are you using this
> unsuccessfully?

This patch quoted below fixes it for me.

No more spurious NMIs on my P4.

Tested-by: Dave Airlie <airlied@xxxxxxxxxx>

>
>
> From: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Subject: [PATCH] perf, x86: P4 PMU -- Fix unflagged overflows test
>
> A couple of people have reported an unknown NMI issue on p4 pmu.
> This patch should fix it.
>
> Reported-by: George Spelvin <linux@xxxxxxxxxxx>
> Reported-by: Meelis Roos <mroos@xxxxxxxx>
> Reported-by: Don Zickus <dzickus@xxxxxxxxxx>
> Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxx>
> CC: Lin Ming <ming.m.lin@xxxxxxxxx>
> CC: Don Zickus <dzickus@xxxxxxxxxx>
> CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> ---
>  arch/x86/include/asm/perf_event_p4.h |    1 +
>  arch/x86/kernel/cpu/perf_event_p4.c  |   11 ++++++++---
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
> ===================================================================
> --- linux-2.6.tip.orig/arch/x86/include/asm/perf_event_p4.h
> +++ linux-2.6.tip/arch/x86/include/asm/perf_event_p4.h
> @@ -22,6 +22,7 @@
>
>  #define ARCH_P4_CNTRVAL_BITS   (40)
>  #define ARCH_P4_CNTRVAL_MASK   ((1ULL << ARCH_P4_CNTRVAL_BITS) - 1)
> +#define ARCH_P4_UNFLAGGED_BIT  ((1ULL) << (ARCH_P4_CNTRVAL_BITS - 1))
>
>  #define P4_ESCR_EVENT_MASK     0x7e000000U
>  #define P4_ESCR_EVENT_SHIFT    25
> Index: linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
> ===================================================================
> --- linux-2.6.tip.orig/arch/x86/kernel/cpu/perf_event_p4.c
> +++ linux-2.6.tip/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -770,9 +770,14 @@ static inline int p4_pmu_clear_cccr_ovf(
>                return 1;
>        }
>
> -       /* it might be unflagged overflow */
> -       rdmsrl(hwc->event_base + hwc->idx, v);
> -       if (!(v & ARCH_P4_CNTRVAL_MASK))
> +       /*
> +        * at some circumstances the overflow might issue NMI but did
> +        * not set P4_CCCR_OVF bit so since a counter holds a negative value
> +        * we simply check for high bit being set, if it's cleared it means
> +        * the counter has reached zero value and continued counting before
> +        * real NMI signal was received
> +        */
> +       if (!(v & ARCH_P4_UNFLAGGED_BIT))
>                return 1;
>
>        return 0;
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/