Re: [PATCH v2 2/2] perf/core: Reduce PMU access to adjust sample freq

From: Namhyung Kim
Date: Fri Jan 19 2024 - 16:01:35 EST


Hello,

On Thu, Jan 18, 2024 at 6:41 PM kernel test robot <oliver.sang@xxxxxxxxx> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "WARNING:at_arch/x86/events/core.c:#x86_pmu_start" on:
>
> commit: d6da92786f901cc4ce3588f101182758da295dbb ("[PATCH v2 2/2] perf/core: Reduce PMU access to adjust sample freq")
> url: https://github.com/intel-lab-lkp/linux/commits/namhyung-kernel-org/perf-core-Reduce-PMU-access-to-adjust-sample-freq/20240112-044505
> base: https://git.kernel.org/cgit/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
> patch link: https://lore.kernel.org/all/20240111204348.669673-2-namhyung@xxxxxxxxxx/
> patch subject: [PATCH v2 2/2] perf/core: Reduce PMU access to adjust sample freq
>
> in testcase: will-it-scale
> version: will-it-scale-x86_64-75f66e4-1_20240111
> with following parameters:
>
> nr_task: 16
> mode: thread
> test: pipe1
> cpufreq_governor: performance
>
>
>
> compiler: gcc-12
> test machine: 104 threads 2 sockets (Skylake) with 192G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)

Thanks for the report. It seems the code calls x86_pmu_stop() without
PERF_EF_UPDATE so we cannot simply skip the stop callback for
frequency events. I'll update it in v3.

Thanks,
Namhyung

>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Closes: https://lore.kernel.org/oe-lkp/202401191023.d52a4ad4-oliver.sang@xxxxxxxxx
>
>
> [ 102.087071][ C94] ------------[ cut here ]------------
> [ 102.092623][ C94] WARNING: CPU: 94 PID: 0 at arch/x86/events/core.c:1507 x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
> [ 102.101826][ C94] Modules linked in: intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp btrfs coretemp blake2b_generic xor kvm_intel kvm raid6_pq libcrc32c irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sd_mod sg rapl ipmi_ssif nvme nvme_core ahci intel_cstate acpi_ipmi t10_pi libahci ast crc64_rocksoft_generic drm_shmem_helper mei_me ipmi_si crc64_rocksoft i2c_i801 ioatdma libata intel_uncore drm_kms_helper joydev crc64 mei ipmi_devintf lpc_ich i2c_smbus intel_pch_thermal dca wmi ipmi_msghandler acpi_pad acpi_power_meter drm fuse ip_tables
> [ 102.158393][ C94] CPU: 94 PID: 0 Comm: swapper/94 Not tainted 6.7.0-rc6-00192-gd6da92786f90 #1
> [ 102.167472][ C94] RIP: 0010:x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
> [ 102.172832][ C94] Code: 00 00 4c 0f ab 65 00 48 89 df e8 16 08 01 00 48 89 df 5b 5d 41 5c e9 4a c6 33 00 0f 0b 5b 5d 41 5c c3 cc cc cc cc 0f 0b eb f3 <0f> 0b eb b6 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00
> All code
> ========
> 0: 00 00 add %al,(%rax)
> 2: 4c 0f ab 65 00 bts %r12,0x0(%rbp)
> 7: 48 89 df mov %rbx,%rdi
> a: e8 16 08 01 00 callq 0x10825
> f: 48 89 df mov %rbx,%rdi
> 12: 5b pop %rbx
> 13: 5d pop %rbp
> 14: 41 5c pop %r12
> 16: e9 4a c6 33 00 jmpq 0x33c665
> 1b: 0f 0b ud2
> 1d: 5b pop %rbx
> 1e: 5d pop %rbp
> 1f: 41 5c pop %r12
> 21: c3 retq
> 22: cc int3
> 23: cc int3
> 24: cc int3
> 25: cc int3
> 26: 0f 0b ud2
> 28: eb f3 jmp 0x1d
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: eb b6 jmp 0xffffffffffffffe4
> 2e: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
> 35: 00 00 00 00
> 39: 66 data16
> 3a: 66 data16
> 3b: 2e cs
> 3c: 0f .byte 0xf
> 3d: 1f (bad)
> 3e: 84 00 test %al,(%rax)
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: eb b6 jmp 0xffffffffffffffba
> 4: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
> b: 00 00 00 00
> f: 66 data16
> 10: 66 data16
> 11: 2e cs
> 12: 0f .byte 0xf
> 13: 1f (bad)
> 14: 84 00 test %al,(%rax)
> [ 102.192917][ C94] RSP: 0018:ffffc9000ddb0e00 EFLAGS: 00010046
> [ 102.199175][ C94] RAX: 0000000000000001 RBX: ffff88b01d17a290 RCX: 0000000000000349
> [ 102.207339][ C94] RDX: 0000000000002ff0 RSI: 0000000000000002 RDI: ffff88b01d17a290
> [ 102.215509][ C94] RBP: ffff88afa149a220 R08: 0000000000000000 R09: 0000000000000014
> [ 102.223684][ C94] R10: 000000000000000f R11: 00000000000f4240 R12: 0000000000000003
> [ 102.231855][ C94] R13: 0000000000000001 R14: ffff88afa14b9680 R15: 000000000000005e
> [ 102.240038][ C94] FS: 0000000000000000(0000) GS:ffff88afa1480000(0000) knlGS:0000000000000000
> [ 102.249178][ C94] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 102.255986][ C94] CR2: 00007f9cdf69ec98 CR3: 000000303e01c002 CR4: 00000000007706f0
> [ 102.264179][ C94] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 102.272365][ C94] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 102.280552][ C94] PKRU: 55555554
> [ 102.284322][ C94] Call Trace:
> [ 102.287830][ C94] <IRQ>
> [ 102.290895][ C94] ? x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
> [ 102.295695][ C94] ? __warn (kernel/panic.c:677)
> [ 102.299980][ C94] ? x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
> [ 102.304768][ C94] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [ 102.309473][ C94] ? handle_bug (arch/x86/kernel/traps.c:237)
> [ 102.314006][ C94] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [ 102.318879][ C94] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
> [ 102.324101][ C94] ? x86_pmu_start (arch/x86/events/core.c:1507 (discriminator 1))
> [ 102.328888][ C94] perf_adjust_freq_unthr_events (kernel/events/core.c:4181 (discriminator 4))
> [ 102.335069][ C94] perf_adjust_freq_unthr_context (kernel/events/core.c:4216)
> [ 102.341244][ C94] perf_event_task_tick (arch/x86/include/asm/current.h:41 kernel/events/core.c:4363)
> [ 102.346458][ C94] scheduler_tick (kernel/sched/core.c:5665)
> [ 102.351240][ C94] update_process_times (kernel/time/timer.c:2079)
> [ 102.356442][ C94] tick_sched_handle (kernel/time/tick-sched.c:256)
> [ 102.361381][ C94] tick_nohz_highres_handler (kernel/time/tick-sched.c:1525)
> [ 102.367021][ C94] ? __pfx_tick_nohz_highres_handler (kernel/time/tick-sched.c:1503)
> [ 102.373345][ C94] __hrtimer_run_queues (kernel/time/hrtimer.c:1688 kernel/time/hrtimer.c:1752)
> [ 102.378720][ C94] hrtimer_interrupt (kernel/time/hrtimer.c:1817)
> [ 102.383748][ C94] __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1065 arch/x86/kernel/apic/apic.c:1082)
> [ 102.389818][ C94] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
> [ 102.395636][ C94] </IRQ>
> [ 102.398759][ C94] <TASK>
> [ 102.401872][ C94] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649)
> [ 102.408032][ C94] RIP: 0010:cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
> [ 102.414008][ C94] Code: 00 e8 9e 46 19 ff e8 d9 f1 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 07 2e 18 ff 45 84 ff 0f 85 d2 00 00 00 fb 45 85 f6 <0f> 88 83 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d 0c c4 48
> All code
> ========
> 0: 00 e8 add %ch,%al
> 2: 9e sahf
> 3: 46 19 ff rex.RX sbb %r15d,%edi
> 6: e8 d9 f1 ff ff callq 0xfffffffffffff1e4
> b: 8b 53 04 mov 0x4(%rbx),%edx
> e: 49 89 c5 mov %rax,%r13
> 11: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 16: 31 ff xor %edi,%edi
> 18: e8 07 2e 18 ff callq 0xffffffffff182e24
> 1d: 45 84 ff test %r15b,%r15b
> 20: 0f 85 d2 00 00 00 jne 0xf8
> 26: fb sti
> 27: 45 85 f6 test %r14d,%r14d
> 2a:* 0f 88 83 01 00 00 js 0x1b3 <-- trapping instruction
> 30: 49 63 d6 movslq %r14d,%rdx
> 33: 48 8d 04 52 lea (%rdx,%rdx,2),%rax
> 37: 48 8d 04 82 lea (%rdx,%rax,4),%rax
> 3b: 49 8d 0c c4 lea (%r12,%rax,8),%rcx
> 3f: 48 rex.W
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 88 83 01 00 00 js 0x189
> 6: 49 63 d6 movslq %r14d,%rdx
> 9: 48 8d 04 52 lea (%rdx,%rdx,2),%rax
> d: 48 8d 04 82 lea (%rdx,%rax,4),%rax
> 11: 49 8d 0c c4 lea (%r12,%rax,8),%rcx
> 15: 48 rex.W
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240119/202401191023.d52a4ad4-oliver.sang@xxxxxxxxx
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>