Re: [PATCH v2 2/2] perf/core: Reduce PMU access to adjust sample freq

From: kernel test robot
Date: Thu Jan 18 2024 - 21:42:30 EST




Hello,

kernel test robot noticed "WARNING:at_arch/x86/events/core.c:#x86_pmu_start" on:

commit: d6da92786f901cc4ce3588f101182758da295dbb ("[PATCH v2 2/2] perf/core: Reduce PMU access to adjust sample freq")
url: https://github.com/intel-lab-lkp/linux/commits/namhyung-kernel-org/perf-core-Reduce-PMU-access-to-adjust-sample-freq/20240112-044505
base: https://git.kernel.org/cgit/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link: https://lore.kernel.org/all/20240111204348.669673-2-namhyung@xxxxxxxxxx/
patch subject: [PATCH v2 2/2] perf/core: Reduce PMU access to adjust sample freq

in testcase: will-it-scale
version: will-it-scale-x86_64-75f66e4-1_20240111
with following parameters:

nr_task: 16
mode: thread
test: pipe1
cpufreq_governor: performance



compiler: gcc-12
test machine: 104 threads 2 sockets (Skylake) with 192G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202401191023.d52a4ad4-oliver.sang@xxxxxxxxx


[ 102.087071][ C94] ------------[ cut here ]------------
[ 102.092623][ C94] WARNING: CPU: 94 PID: 0 at arch/x86/events/core.c:1507 x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
[ 102.101826][ C94] Modules linked in: intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp btrfs coretemp blake2b_generic xor kvm_intel kvm raid6_pq libcrc32c irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sd_mod sg rapl ipmi_ssif nvme nvme_core ahci intel_cstate acpi_ipmi t10_pi libahci ast crc64_rocksoft_generic drm_shmem_helper mei_me ipmi_si crc64_rocksoft i2c_i801 ioatdma libata intel_uncore drm_kms_helper joydev crc64 mei ipmi_devintf lpc_ich i2c_smbus intel_pch_thermal dca wmi ipmi_msghandler acpi_pad acpi_power_meter drm fuse ip_tables
[ 102.158393][ C94] CPU: 94 PID: 0 Comm: swapper/94 Not tainted 6.7.0-rc6-00192-gd6da92786f90 #1
[ 102.167472][ C94] RIP: 0010:x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
[ 102.172832][ C94] Code: 00 00 4c 0f ab 65 00 48 89 df e8 16 08 01 00 48 89 df 5b 5d 41 5c e9 4a c6 33 00 0f 0b 5b 5d 41 5c c3 cc cc cc cc 0f 0b eb f3 <0f> 0b eb b6 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00
All code
========
0: 00 00 add %al,(%rax)
2: 4c 0f ab 65 00 bts %r12,0x0(%rbp)
7: 48 89 df mov %rbx,%rdi
a: e8 16 08 01 00 callq 0x10825
f: 48 89 df mov %rbx,%rdi
12: 5b pop %rbx
13: 5d pop %rbp
14: 41 5c pop %r12
16: e9 4a c6 33 00 jmpq 0x33c665
1b: 0f 0b ud2
1d: 5b pop %rbx
1e: 5d pop %rbp
1f: 41 5c pop %r12
21: c3 retq
22: cc int3
23: cc int3
24: cc int3
25: cc int3
26: 0f 0b ud2
28: eb f3 jmp 0x1d
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb b6 jmp 0xffffffffffffffe4
2e: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
35: 00 00 00 00
39: 66 data16
3a: 66 data16
3b: 2e cs
3c: 0f .byte 0xf
3d: 1f (bad)
3e: 84 00 test %al,(%rax)

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb b6 jmp 0xffffffffffffffba
4: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
b: 00 00 00 00
f: 66 data16
10: 66 data16
11: 2e cs
12: 0f .byte 0xf
13: 1f (bad)
14: 84 00 test %al,(%rax)
[ 102.192917][ C94] RSP: 0018:ffffc9000ddb0e00 EFLAGS: 00010046
[ 102.199175][ C94] RAX: 0000000000000001 RBX: ffff88b01d17a290 RCX: 0000000000000349
[ 102.207339][ C94] RDX: 0000000000002ff0 RSI: 0000000000000002 RDI: ffff88b01d17a290
[ 102.215509][ C94] RBP: ffff88afa149a220 R08: 0000000000000000 R09: 0000000000000014
[ 102.223684][ C94] R10: 000000000000000f R11: 00000000000f4240 R12: 0000000000000003
[ 102.231855][ C94] R13: 0000000000000001 R14: ffff88afa14b9680 R15: 000000000000005e
[ 102.240038][ C94] FS: 0000000000000000(0000) GS:ffff88afa1480000(0000) knlGS:0000000000000000
[ 102.249178][ C94] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 102.255986][ C94] CR2: 00007f9cdf69ec98 CR3: 000000303e01c002 CR4: 00000000007706f0
[ 102.264179][ C94] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 102.272365][ C94] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 102.280552][ C94] PKRU: 55555554
[ 102.284322][ C94] Call Trace:
[ 102.287830][ C94] <IRQ>
[ 102.290895][ C94] ? x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
[ 102.295695][ C94] ? __warn (kernel/panic.c:677)
[ 102.299980][ C94] ? x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
[ 102.304768][ C94] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 102.309473][ C94] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 102.314006][ C94] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 102.318879][ C94] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 102.324101][ C94] ? x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
[ 102.328888][ C94] perf_adjust_freq_unthr_events (kernel/events/core.c:4181 (discriminator 4))
[ 102.335069][ C94] perf_adjust_freq_unthr_context (kernel/events/core.c:4216)
[ 102.341244][ C94] perf_event_task_tick (arch/x86/include/asm/current.h:41 kernel/events/core.c:4363)
[ 102.346458][ C94] scheduler_tick (kernel/sched/core.c:5665)
[ 102.351240][ C94] update_process_times (kernel/time/timer.c:2079)
[ 102.356442][ C94] tick_sched_handle (kernel/time/tick-sched.c:256)
[ 102.361381][ C94] tick_nohz_highres_handler (kernel/time/tick-sched.c:1525)
[ 102.367021][ C94] ? __pfx_tick_nohz_highres_handler (kernel/time/tick-sched.c:1503)
[ 102.373345][ C94] __hrtimer_run_queues (kernel/time/hrtimer.c:1688 kernel/time/hrtimer.c:1752)
[ 102.378720][ C94] hrtimer_interrupt (kernel/time/hrtimer.c:1817)
[ 102.383748][ C94] __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1065 arch/x86/kernel/apic/apic.c:1082)
[ 102.389818][ C94] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
[ 102.395636][ C94] </IRQ>
[ 102.398759][ C94] <TASK>
[ 102.401872][ C94] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649)
[ 102.408032][ C94] RIP: 0010:cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
[ 102.414008][ C94] Code: 00 e8 9e 46 19 ff e8 d9 f1 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 07 2e 18 ff 45 84 ff 0f 85 d2 00 00 00 fb 45 85 f6 <0f> 88 83 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d 0c c4 48
All code
========
0: 00 e8 add %ch,%al
2: 9e sahf
3: 46 19 ff rex.RX sbb %r15d,%edi
6: e8 d9 f1 ff ff callq 0xfffffffffffff1e4
b: 8b 53 04 mov 0x4(%rbx),%edx
e: 49 89 c5 mov %rax,%r13
11: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
16: 31 ff xor %edi,%edi
18: e8 07 2e 18 ff callq 0xffffffffff182e24
1d: 45 84 ff test %r15b,%r15b
20: 0f 85 d2 00 00 00 jne 0xf8
26: fb sti
27: 45 85 f6 test %r14d,%r14d
2a:* 0f 88 83 01 00 00 js 0x1b3 <-- trapping instruction
30: 49 63 d6 movslq %r14d,%rdx
33: 48 8d 04 52 lea (%rdx,%rdx,2),%rax
37: 48 8d 04 82 lea (%rdx,%rax,4),%rax
3b: 49 8d 0c c4 lea (%r12,%rax,8),%rcx
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 0f 88 83 01 00 00 js 0x189
6: 49 63 d6 movslq %r14d,%rdx
9: 48 8d 04 52 lea (%rdx,%rdx,2),%rax
d: 48 8d 04 82 lea (%rdx,%rax,4),%rax
11: 49 8d 0c c4 lea (%r12,%rax,8),%rcx
15: 48 rex.W


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240119/202401191023.d52a4ad4-oliver.sang@xxxxxxxxx



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki