Re: [Syzkaller & bisect] There is "__perf_event_overflow" WARNING in v6.1-rc5 kernel in guest

From: Pengfei Xu
Date: Wed Nov 16 2022 - 20:37:07 EST


Hi Peter,

On 2022-11-16 at 15:40:24 +0100, Peter Zijlstra wrote:
> On Wed, Nov 16, 2022 at 11:39:53AM +0800, Pengfei Xu wrote:
> > Hi Peter and perf expert,
> >
> > Greeting!
> >
> > Platform: TGL-H
> >
> > There is "__perf_event_overflow" WARNING issue in v6.1-rc5 kernel in
> > guest in double check test.
> >
> > Found first bad commit is ca6c21327c6af02b7eec31ce4b9a740a18c6c13f
> > "perf: Fix missing SIGTRAPs"
> >
> > And revert this commit on top of v6.1-rc5, this issue could not be reproduced.
> >
> > Guest kconfig, reproduce code from syzkaller, and bisect info are in attached.
> >
> > And more detailed info is in link:
> > https://github.com/xupengfe/syzkaller_logs/tree/main/221114_134736___perf_event_overflow
> >
> > If it's helpful and in time, please add the Reported-by tag from me.
>
> Does this help?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=perf/urgent&id=bb88f9695460bec25aa30ba9072595025cf6c8af
Thanks for your link!
I installed the patch from above link on top of v6.1-rc5 kernel, the patch
is in attached.

And I still could reproduce this issue.
I use below loop to execute the binary, it's reproduced in the 485 times
execution.
"
for((i=0; ;i++)); do
echo "$i times ./repro"
./repro
done
"
485 times ./repro
[ 87.978410] ------------[ cut here ]------------
[ 87.978430] WARNING: CPU: 0 PID: 970 at kernel/events/core.c:9329 __perf_event_overflow+0x22b/0x270
[ 87.978464] Modules linked in:
[ 87.978470] CPU: 0 PID: 970 Comm: repro Not tainted 6.1.0-rc5-kvmperfoverflow+ #10
[ 87.978487] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 87.978498] RIP: 0010:__perf_event_overflow+0x22b/0x270
[ 87.978516] Code: b6 45 d3 84 c0 0f 84 26 ff ff ff e8 4f 31 ec ff 8b 75 d4 44 89 ff e8 04 32 ec ff 44 3b 7d d4 0f 84 0c ff ff ff e8 35 31 ec ff <0f> 0b e9 00 6
[ 87.978531] RSP: 0000:fffffe000000db58 EFLAGS: 00010046
[ 87.978543] RAX: 0000000000000000 RBX: ffff888004ed5ce0 RCX: ffffffff8138b67c
[ 87.978554] RDX: 0000000000000000 RSI: ffff88800b8e9fc0 RDI: 0000000000000002
[ 87.978565] RBP: fffffe000000db88 R08: 0000001ffad06476 R09: 0000000000000000
[ 87.978583] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe000000def8
[ 87.978594] R13: fffffe000000dc00 R14: 0000000000000000 R15: 0000000088db7a04
[ 87.978607] FS: 00007f9988c8b740(0000) GS:ffff88807dc00000(0000) knlGS:ffff88807dc00000
[ 87.978625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 87.978639] CR2: 0000000020000200 CR3: 000000000a77c003 CR4: 0000000000770ef0
[ 87.978655] PKRU: 55555554
[ 87.978662] Call Trace:
[ 87.978667] <NMI>
[ 87.978675] perf_event_overflow+0x33/0x40
[ 87.978705] handle_pmi_common+0x2d8/0x560
[ 87.978736] ? write_comp_data+0x2f/0x90
[ 87.978760] ? write_comp_data+0x2f/0x90
[ 87.978783] intel_pmu_handle_irq+0x183/0x680
[ 87.978805] perf_event_nmi_handler+0x42/0x70
[ 87.978842] nmi_handle+0x63/0x160
[ 87.978872] default_do_nmi+0x77/0x190
[ 87.978903] exc_nmi+0x157/0x190
[ 87.978932] end_repeat_nmi+0x16/0x67
[ 87.978963] RIP: 0010:asm_sysvec_irq_work+0x0/0x30
[ 87.978993] Code: ca fc 6a ff e8 a1 05 00 00 48 89 c4 48 8d 6c 24 01 48 89 e7 e8 e1 56 ea ff e9 bc 06 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 <f3> 0f 1e fa 4
[ 87.979010] RSP: 0000:fffffe0000002fd8 EFLAGS: 00000002
[ 87.979024] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f9988db059d
[ 87.979037] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000200
[ 87.979049] RBP: 00007ffe528b2a70 R08: 0000000000000003 R09: 0000000000401a60
[ 87.979062] R10: 00000000ffffffff R11: 0000000000000202 R12: 00000000004010e0
[ 87.979075] R13: 00007ffe528b2b50 R14: 0000000000000000 R15: 0000000000000000
[ 87.979093] ? asm_sysvec_thermal+0x30/0x30
[ 87.979123] ? asm_sysvec_thermal+0x30/0x30
[ 87.979153] </NMI>
[ 87.979158] <ENTRY_TRAMPOLINE>
[ 87.979165] </ENTRY_TRAMPOLINE>
[ 87.979171] ---[ end trace 0000000000000000 ]---

Before the loop execution, I tried several times manually, and sometimes, I
saw "Trace/breakpoint trap (core dumped)" after executed repro, but there is
no any dmesg info generated.

Anyway all the dmesg and patch is in attached. Hope it's helpful.

Thanks!
BR.