perf: WARNING perfevents: irq loop stuck!

From: Vince Weaver
Date: Thu Apr 30 2015 - 16:59:06 EST



So this is just a warning, and I've reported it before, but the
perf_fuzzer triggers this fairly regularly on my Haswell system.

It looks like fixed counter 0 (retired instructions) being set to
0000fffffffffffe occasionally causes an irq loop storm and gets stuck
until the PMU state is cleared.

[ 8224.179407] ------------[ cut here ]------------
[ 8224.184368] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/perf_event_intel.c:1602 intel_pmu_handle_irq+0x2bc/0x450()
[ 8224.195686] perfevents: irq loop stuck!
[ 8224.199835] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm crct10dif_pclmul snd_hda_codec_hdmi cr$
[ 8224.273188] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.0.0+ #136
[ 8224.281012] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 8224.288988] ffffffff81a093a8 ffff88011ea05b50 ffffffff816d5639 0000000000000000
[ 8224.297058] ffff88011ea05ba0 ffff88011ea05b90 ffffffff81072e4a ffff880119b87090
[ 8224.306453] 0000000000000064 ffff88011ea0bd80 ffff8800c38c7800 0000000000000040
[ 8224.315894] Call Trace:
[ 8224.319886] <NMI> [<ffffffff816d5639>] dump_stack+0x45/0x57
[ 8224.327439] [<ffffffff81072e4a>] warn_slowpath_common+0x8a/0xc0
[ 8224.335238] [<ffffffff81072ec6>] warn_slowpath_fmt+0x46/0x50
[ 8224.342812] [<ffffffff8103399c>] intel_pmu_handle_irq+0x2bc/0x450
[ 8224.350830] [<ffffffff81164215>] ? perf_event_task_tick+0x1d5/0x2b0
[ 8224.359025] [<ffffffff8102a40b>] perf_event_nmi_handler+0x2b/0x50
[ 8224.367069] [<ffffffff810185c0>] nmi_handle+0xa0/0x150
[ 8224.374067] [<ffffffff81018525>] ? nmi_handle+0x5/0x150
[ 8224.381169] [<ffffffff8101886a>] default_do_nmi+0x4a/0x140
[ 8224.388487] [<ffffffff810189f8>] do_nmi+0x98/0xe0
[ 8224.394886] [<ffffffff816df8af>] end_repeat_nmi+0x1e/0x2e
[ 8224.402082] [<ffffffff8105bffa>] ? native_write_msr_safe+0xa/0x10
[ 8224.409958] [<ffffffff8105bffa>] ? native_write_msr_safe+0xa/0x10
[ 8224.417785] [<ffffffff8105bffa>] ? native_write_msr_safe+0xa/0x10
[ 8224.425536] <<EOE>> <IRQ> [<ffffffff8103356e>] intel_pmu_enable_event+0x20e/0x230
[ 8224.435001] [<ffffffff8102b721>] x86_pmu_start+0x81/0x120
[ 8224.442007] [<ffffffff8102bf25>] x86_pmu_enable+0x295/0x310
[ 8224.449150] [<ffffffff811621aa>] perf_pmu_enable+0x2a/0x30
[ 8224.456173] [<ffffffff81029fe8>] x86_pmu_commit_txn+0x78/0xa0
[ 8224.456173] [<ffffffff81029fe8>] x86_pmu_commit_txn+0x78/0xa0
[ 8224.463561] [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0
[ 8224.471591] [<ffffffff810bdde9>] ? __lock_acquire.isra.31+0x3b9/0x1000
[ 8224.479715] [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0
[ 8224.487676] [<ffffffff810bcba4>] ? __lock_is_held+0x54/0x80
[ 8224.494794] [<ffffffff81162d50>] ? event_sched_in.isra.85+0x190/0x330
[ 8224.502894] [<ffffffff811630a8>] group_sched_in+0x1b8/0x1e0
[ 8224.510082] [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90
[ 8224.517646] [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0
[ 8224.525297] [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30
[ 8224.532776] [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40
[ 8224.540595] [<ffffffff8115ef80>] remote_function+0x50/0x60
[ 8224.547668] [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180
[ 8224.556126] [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60
[ 8224.565749] [<ffffffff8104b5c8>] smp_trace_call_function_single_interrupt+0x38/0xc0
[ 8224.575132] [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80
[ 8224.584168] <EOI> [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160
[ 8224.584168] <EOI> [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160
[ 8224.592646] [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160
[ 8224.600408] [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20
[ 8224.607285] [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440
[ 8224.614712] [<ffffffff816c9ddb>] rest_init+0xbb/0xd0
[ 8224.621283] [<ffffffff81d46f74>] start_kernel+0x44e/0x45b
[ 8224.628212] [<ffffffff81d46120>] ? early_idt_handlers+0x120/0x120
[ 8224.635980] [<ffffffff81d464d7>] x86_64_start_reservations+0x2a/0x2c
[ 8224.643999] [<ffffffff81d46614>] x86_64_start_kernel+0x13b/0x14a
[ 8224.651691] ---[ end trace 5679ca0875946dba ]---
[ 8224.657764]
[ 8224.660447] CPU#0: ctrl: 0000000000000000
[ 8224.666448] CPU#0: status: 0000000000000000
[ 8224.672414] CPU#0: overflow: 0000000000000000
[ 8224.678373] CPU#0: fixed: 00000000000000b9
[ 8224.684338] CPU#0: pebs: 0000000000000000
[ 8224.690322] CPU#0: debugctl: 0000000000000000
[ 8224.696253] CPU#0: active: 0000000300000000
[ 8224.702232] CPU#0: gen-PMC0 ctrl: 00000000108001c2
[ 8224.708823] CPU#0: gen-PMC0 count: 0000000000000001
[ 8224.715371] CPU#0: gen-PMC0 left: 0000ffffffffffff
[ 8224.721914] CPU#0: gen-PMC1 ctrl: 00000000001382d0
[ 8224.728402] CPU#0: gen-PMC1 count: 0000000000ff3792
[ 8224.734849] CPU#0: gen-PMC1 left: 0000ffffff012119
[ 8224.741271] CPU#0: gen-PMC2 ctrl: 00000003ff96764b
[ 8224.747600] CPU#0: gen-PMC2 count: 0000000000000001
[ 8224.753884] CPU#0: gen-PMC2 left: 0000ffffffffffff
[ 8224.760100] CPU#0: gen-PMC3 ctrl: 0000000000131acd
[ 8224.766320] CPU#0: gen-PMC3 count: 000000000000288a
[ 8224.772585] CPU#0: gen-PMC3 left: 0000ffffffffffff
[ 8224.778769] CPU#0: fixed-PMC0 count: 0000fffffffffffe
[ 8224.784973] CPU#0: fixed-PMC1 count: 0000fffda1877268
[ 8224.791196] CPU#0: fixed-PMC2 count: 0000000000f222eb
[ 8224.797450] perf_event_intel: clearing PMU state on CPU#0
[ 8224.804063] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 624.630 msecs
[ 8224.813789] perf interrupt took too long (4880480 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 8224.825400] perf interrupt took too long (4842512 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[ 8225.007411] perf interrupt took too long (4804688 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[ 8225.244832] perf interrupt took too long (4767159 > 20000), lowering kernel.perf_event_max_sample_rate to 6250
[ 8225.295535] perf interrupt took too long (4729923 > 38461), lowering kernel.perf_event_max_sample_rate to 3250
[ 8225.747314] perf interrupt took too long (4692980 > 71428), lowering kernel.perf_event_max_sample_rate to 1750
...
[202220.612133] perf_event_intel: clearing PMU state on CPU#6
[202220.665533] perf_event_intel: clearing PMU state on CPU#6
[202220.674658] perf_event_intel: clearing PMU state on CPU#6
[202220.734200] perf_event_intel: clearing PMU state on CPU#6
[202220.966526] perf_event_intel: clearing PMU state on CPU#6
[202220.976874] perf_event_intel: clearing PMU state on CPU#6
[202222.950479] perf_event_intel: clearing PMU state on CPU#6
[202223.183925] perf_event_intel: clearing PMU state on CPU#6
[212335.610297] perf_event_intel: clearing PMU state on CPU#2
[212335.922425] perf_event_intel: clearing PMU state on CPU#2
[212336.014458] perf_event_intel: clearing PMU state on CPU#2
[212336.258568] perf_event_intel: clearing PMU state on CPU#2
[226783.599841] perf_event_intel: clearing PMU state on CPU#0
[269662.113390] perf_event_intel: clearing PMU state on CPU#4
[401640.647353] perf_event_intel: clearing PMU state on CPU#3
[409463.976440] perf_event_intel: clearing PMU state on CPU#6
[409464.172527] perf_event_intel: clearing PMU state on CPU#6
[409464.692722] perf_event_intel: clearing PMU state on CPU#6
[422018.413829] perf_event_intel: clearing PMU state on CPU#1
[422018.458279] perf_event_intel: clearing PMU state on CPU#1
[422018.496802] perf_event_intel: clearing PMU state on CPU#1
[422018.510058] perf_event_intel: clearing PMU state on CPU#1
[422018.523668] perf_event_intel: clearing PMU state on CPU#1
[422018.549935] perf_event_intel: clearing PMU state on CPU#1
[422019.274748] perf_event_intel: clearing PMU state on CPU#1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/