Re: WARNING in bpf_cgroup_link_release

From: Andrii Nakryiko
Date: Wed Apr 15 2020 - 12:52:08 EST


On 4/15/20 4:57 AM, Daniel Borkmann wrote:
On 4/15/20 8:55 AM, syzbot wrote:
Hello,

syzbot found the following crash on:

Andrii, ptal.

HEAD commit:ÂÂÂ 1a323ea5 x86: get rid of 'errret' argument to __get_user_x..
git tree:ÂÂÂÂÂÂ bpf-next
console output: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D148ccb57e00000&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=vxqvl81C2rT6GOGdPyz8iQ&m=T2Ez0XmyIpHmEa_MPTTUOh61jMDXqwETtTaTbSe-2M4&s=-6XBbsNV1O4X5flrx4Yssfjc56d0qeSHgwHhd92UPJc&e= kernel config: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_.config-3Fx-3D8c1e98458335a7d1&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=vxqvl81C2rT6GOGdPyz8iQ&m=T2Ez0XmyIpHmEa_MPTTUOh61jMDXqwETtTaTbSe-2M4&s=s5-1AlWtSiBvo66WN4_UXoXMGIGIqsoUCrmAnxNnfX0&e= dashboard link: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_bug-3Fextid-3D8a5dadc5c0b1d7055945&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=vxqvl81C2rT6GOGdPyz8iQ&m=T2Ez0XmyIpHmEa_MPTTUOh61jMDXqwETtTaTbSe-2M4&s=hAA0702qJH5EwRwvG0RKmj8FwIRm1O8hvmoS7ne5Dls&e= compiler:ÂÂÂÂÂÂ gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8a5dadc5c0b1d7055945@xxxxxxxxxxxxxxxxxxxxxxxxx

------------[ cut here ]------------
WARNING: CPU: 0 PID: 25081 at kernel/bpf/cgroup.c:796 bpf_cgroup_link_release+0x260/0x3a0 kernel/bpf/cgroup.c:796

This warning is triggered due to __cgroup_bpf_detach returning an error. It can do it only in two cases: either attached item is not found, which from starting at code some moreI don't see how that can happen. The other reason - kmalloc() failing to allocate memory for new effective prog array. The latter is a bit annoying behavior of cgroup detach, and I wonder if it makes sense to actually make that operation non-failing by replacing detached program with dummy noop program. Or at least do it if allocating new effective prog array fails. This wasn't previously triggered, because when user explicitly detaches and that fails, we'd be just returning this to user-space, but for links we have WARN_ON, because we have no way to propagate error back, because there is little user can do about that.

So, should we change detach to be non-failing (assuming program to be detached is found?)

Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 25081 Comm: syz-executor.1 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 panic+0x2e3/0x75c kernel/panic.c:221
 __warn.cold+0x2f/0x35 kernel/panic.c:582
 report_bug+0x27b/0x2f0 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:175 [inline]
 fixup_bug arch/x86/kernel/traps.c:170 [inline]
 do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:bpf_cgroup_link_release+0x260/0x3a0 kernel/bpf/cgroup.c:796
Code: cf ff 5b 5d 41 5c e9 df 2a e9 ff e8 da 2a e9 ff 48 c7 c7 20 f4 9d 89 e8 de a0 3a 06 5b 5d 41 5c e9 c5 2a e9 ff e8 c0 2a e9 ff <0f> 0b e9 57 fe ff ff e8 a4 3d 26 00 e9 2a fe ff ff e8 9a 3d 26 00
RSP: 0018:ffffc900019a7dc0 EFLAGS: 00010246
RAX: 0000000000040000 RBX: ffff88808c3eac00 RCX: ffffc9000415a000
RDX: 0000000000040000 RSI: ffffffff8189bea0 RDI: 0000000000000005
RBP: 00000000fffffff4 R08: ffff88809055e000 R09: ffffed1015cc70f4
R10: ffffed1015cc70f3 R11: ffff8880ae63879b R12: ffff88808c3eac60
R13: ffff88808c3eac10 R14: ffffc90000f32000 R15: ffffffff817f8e60
 bpf_link_free+0x80/0x140 kernel/bpf/syscall.c:2217
 bpf_link_put+0x15e/0x1b0 kernel/bpf/syscall.c:2243
 bpf_link_release+0x33/0x40 kernel/bpf/syscall.c:2251
 __fput+0x2e9/0x860 fs/file_table.c:280
 task_work_run+0xf4/0x1b0 kernel/task_work.c:123
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:165
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
 do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45c889
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fddaf43fc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 00007fddaf4406d4 RCX: 000000000045c889
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
R13: 0000000000000078 R14: 00000000005043d2 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_tpsmEJ&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=vxqvl81C2rT6GOGdPyz8iQ&m=T2Ez0XmyIpHmEa_MPTTUOh61jMDXqwETtTaTbSe-2M4&s=jBcp1pSQqrDLletxcTuqMoEa0bDhfqxI8vS5QM-yBGY&e= for more information about syzbot.
syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxxx

syzbot will keep track of this bug report. See:
https://urldefense.proofpoint.com/v2/url?u=https-3A__goo.gl_tpsmEJ-23status&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=vxqvl81C2rT6GOGdPyz8iQ&m=T2Ez0XmyIpHmEa_MPTTUOh61jMDXqwETtTaTbSe-2M4&s=jgiRP_4-vqlJiCpbXgMh0QfDg8iYJzW-i7MZS8KdapM&e= for how to communicate with syzbot.