Re: NMI received for unknown reason, 2.6.38-rc6regression?

From: denys
Date: Wed Mar 02 2011 - 08:16:52 EST


On Wed, 2 Mar 2011 08:59:31 +0100, Ingo Molnar wrote:
* denys@xxxxxxxxxxx <denys@xxxxxxxxxxx> wrote:

On Tue, 01 Mar 2011 19:08:43 +0300, Cyrill Gorcunov wrote:
>On 03/01/2011 06:03 PM, denys@xxxxxxxxxxx wrote:
>>I upgrade around 140 hosts (from 2.6.33 till 2.6.37), and got on
>>many of them error/warining, flooding kernel log. Here is short
>>snapshot:
>>
>>[ 1882.057474] Uhhuh. NMI received for unknown reason 3c on CPU 0.
>>[ 1882.057576] Do you have a strange power saving mode enabled?
>>[ 1882.057672] Dazed and confused, but trying to continue
>>[ 2421.419732] Uhhuh. NMI received for unknown reason 3c on CPU 0.
>>[ 2421.419835] Do you have a strange power saving mode enabled?
>>[ 2421.419930] Dazed and confused, but trying to continue
>>[ 2636.016831] Uhhuh. NMI received for unknown reason 2c on CPU 1.
>>[ 2636.016934] Do you have a strange power saving mode enabled?
>>[ 2636.017003] Dazed and confused, but trying to continue
>>
>>Full dmesg from 2 machines:
>>http://www.nuclearcat.com/dmesg1.txt
>>http://www.nuclearcat.com/dmesg2.txt
>>I can provide more, if required.
>>
>>It seems nmi_watchdog is enabled by default, and it is causing
>>issue. I am checking now with nmi_watchdog=0, but i need more
>>time to confirm that.
>>Also i am experiencing some problem with ppp users(all of them
>>is pppoe servers), but i am not sure it is related to that, so
>>maybe this NMI warning is just cosmetic regression.
>>
>>All systems is x86, same kernel config.
>>If you need more information - let me know.
>>
>
>nmi_watchdog=0 should help here, actually a nit was fixed by
>https://patchwork.kernel.org/patch/566611/
>which is not in 2.6.38-rc6 but I rather suspect it'll be in -rc7 or
>final .38. If you have an ability
>to pickup it and test -- this would be great!
I test it, and it seems helps. At least on one host, and yes, seems
all of them P4.

Mind checking -rc7, does it work 'out of box', without requiring any
workarounds?
-rc7 already has this fix included:

7d44ec193d95: perf, x86: P4 PMU: Fix spurious NMI messages

-rc6 did not have it yet.

Yes, rc7 fine too, tested it now.


Thanks,

Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/