Re: WARNING in free_event

From: Ian Rogers
Date: Mon Mar 11 2024 - 13:32:21 EST


On Mon, Mar 11, 2024 at 9:38 AM cheung wall <zzqq0103.hey@xxxxxxxxx> wrote:
>
> Hello,
>
>
>
> when using Healer to fuzz the latest Linux Kernel, the following crash
>
> was triggered on:
>
>
>
> HEAD commit: 90d35da658da8cff0d4ecbb5113f5fac9d00eb72 (tag: v6.8-rc7)
>
> git tree: upstream
>
> console output: https://drive.google.com/file/d/17rMhBgYu1sr1d8fiFPS2cPQ60VNJv8H_/view?usp=drive_link
>
> kernel config: https://drive.google.com/file/d/19VXB1YKwoBFpzjqTmm02jVi4-N9tNIvm/view?usp=drive_link
>
> C reproducer: https://drive.google.com/file/d/1fC55GmW0di5HgxQWhrf6Vj1EA2LXsaNJ/view?usp=drive_link
>
> Syzlang reproducer: https://drive.google.com/file/d/1uTbMUGRUK4kLiJHyK5dD2-_beUqx4V55/view?usp=drive_link
>
>
>
> If you fix this issue, please add the following tag to the commit:
>
> Reported-by: Qiang Zhang <zzqq0103.hey@xxxxxxxxx>
>
> ----------------------------------------------------------
>
>
> WARNING: CPU: 4 PID: 817954 at kernel/events/core.c:5254 free_event+0x8b/0xb0 kernel/events/core.c:5254
> Modules linked in:
> CPU: 4 PID: 817954 Comm: syz-executor.1 Not tainted 6.8.0-rc7 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:free_event+0x8b/0xb0 kernel/events/core.c:5254
> Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 75 23 48 8b b5 38 02 00 00 48 89 ea 48 c7 c7 00 5d cd 8e e8 86 2e c3 ff 90 <0f> 0b 90 90 5d 41 5c e9 69 aa ef ff 4c 89 e7 e8 21 73 1c 00 eb d3
> RSP: 0018:ffff88811864fd78 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 1ffff110230c9fbd RCX: ffffffff8bd1ba08
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811864fb70
> RBP: ffff88810e94f2d8 R08: 0000000000000001 R09: ffffed10230c9f6f
> R10: ffffed10230c9f6e R11: ffff88811864fb77 R12: ffff88810e94f510
> R13: ffff8881046744e8 R14: ffff88812ee40768 R15: ffff88812ee40000
> FS: 00007fc2f7eb0700(0000) GS:ffff8881c0200000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffca566ae18 CR3: 0000000164ee6001 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> PKRU: 55555554
> Call Trace:
> <TASK>
> __do_sys_perf_event_open+0x4fd/0x1d50 kernel/events/core.c:12803
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xb4/0x1b0 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7fc2f8b62c8d
> Code: c3 e8 17 32 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fc2f7eafbf8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
> RAX: ffffffffffffffda RBX: 00007fc2f8c9e050 RCX: 00007fc2f8b62c8d
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
> RBP: 00007fc2f8bc04af R08: 0000000000000002 R09: 0000000000000000
> R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007fff421d61cf R14: 00007fff421d6370 R15: 00007fc2f7eafd80
> </TASK>

I find reading the reproducers hard work. Turn things like:
```
..
*(uint32_t*)0x200000c0 = 0;
*(uint32_t*)0x200000c4 = 0x80;
*(uint8_t*)0x200000c8 = 0;
*(uint8_t*)0x200000c9 = 0;
*(uint8_t*)0x200000ca = 0;
*(uint8_t*)0x200000cb = 0;
..
res = syscall(__NR_perf_event_open, /*attr=*/0x200000c0ul, /*pid=*/0,
/*cpu=*/0ul, /*group=*/-1, /*flags=*/0ul);
```
into the corresponding type/config. Then work out what's going on with
the case statements, is the test running in a hypervisor, etc.

I've seen a similar failure to this reported, but in that case the
event was a software cpu-clock event. The warning is:
```
..
[ 2519.138665] unexpected event refcount: 2; ptr=000000009c56b097
..
[ 2518.913630] ? free_event+0x8b/0xb0
[ 2518.914040] ? report_bug+0x2cc/0x430
[ 2518.914467] ? free_event+0x8b/0xb0
[ 2518.915380] ? handle_bug+0x5e/0xc0
[ 2518.915965] ? exc_invalid_op+0x25/0x70
[ 2518.916565] ? asm_exc_invalid_op+0x1a/0x20
[ 2518.917635] ? __warn_printk+0x158/0x200
[ 2518.918112] ? free_event+0x8b/0xb0
[ 2518.918538] ? free_event+0x8a/0xb0
[ 2518.918944] __do_sys_perf_event_open+0x4fd/0x1d50
..
```
That is free_event was called and we expected the reference count to
be 1, it wasn't and so the memory wasn't freed potentially causing a
memory leak (better than a potential use-after-free) - hopefully
whoever has the other reference count will free the memory. A suspect
for holding the reference count would be the interrupt handler, but
maybe it is another CPU. I think it is inherent in the nature of the
bug that there is a race condition. It'd be interesting to know if
this is 100% reproducible.

Thanks,
Ian