Re: KASAN: use-after-free Read in sock_release

From: Eric Dumazet
Date: Wed Nov 29 2017 - 15:49:26 EST


On Wed, 2017-11-29 at 11:37 -0800, Cong Wang wrote:
> (Cc'ing fs people...)
>
> On Wed, Nov 29, 2017 at 12:33 AM, syzbot
> <bot+9abea25706ae35022385a41f61e579ed66e88a3f@xxxxxxxxxxxxxxxxxxxxxxx
> om>
> wrote:
> > Hello,
> >
> > syzkaller hit the following crash on
> > 1d3b78bbc6e983fabb3fbf91b76339bf66e4a12c
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-
> > next.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> >
> > Unfortunately, I don't have any reproducer for this bug yet.
> >
> >
> > device syz3 left promiscuous mode
> > device syz3 entered promiscuous mode
> > ==================================================================
> > BUG: KASAN: use-after-free in sock_release+0x1c6/0x1e0
> > net/socket.c:601
> > Read of size 8 at addr ffff8801c8dd1d10 by task syz-executor4/31085
> >
> > CPU: 0 PID: 31085 Comm: syz-executor4 Not tainted 4.14.0+ #129
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS
> > Google 01/01/2011
> > Call Trace:
> > Â__dump_stack lib/dump_stack.c:17 [inline]
> > Âdump_stack+0x194/0x257 lib/dump_stack.c:53
> > Âprint_address_description+0x73/0x250 mm/kasan/report.c:252
> > Âkasan_report_error mm/kasan/report.c:351 [inline]
> > Âkasan_report+0x25b/0x340 mm/kasan/report.c:409
> > Â__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
> > Âsock_release+0x1c6/0x1e0 net/socket.c:601
> > Âsock_close+0x16/0x20 net/socket.c:1125
> > Â__fput+0x333/0x7f0 fs/file_table.c:210
> > Â____fput+0x15/0x20 fs/file_table.c:244
> > Âtask_work_run+0x199/0x270 kernel/task_work.c:113
> > Âexit_task_work include/linux/task_work.h:22 [inline]
> > Âdo_exit+0x9bb/0x1ae0 kernel/exit.c:865
> > Âdo_group_exit+0x149/0x400 kernel/exit.c:968
> > Âget_signal+0x73f/0x16c0 kernel/signal.c:2335
> > Âdo_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:809
> > Âexit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
> > Âprepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
> > Âsyscall_return_slowpath+0x490/0x550 arch/x86/entry/common.c:264
> > Âentry_SYSCALL_64_fastpath+0x94/0x96
> > RIP: 0033:0x452879
> > RSP: 002b:00007fb1c2435ce8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000ca
> > RAX: fffffffffffffe00 RBX: 0000000000758100 RCX: 0000000000452879
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000758100
> > RBP: 0000000000758100 R08: 0000000000000304 R09: 00000000007580d8
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000a6f7ff R14: 00007fb1c24369c0 R15: 000000000000000e
> >
> > Allocated by task 31066:
> > Âsave_stack+0x43/0xd0 mm/kasan/kasan.c:447
> > Âset_track mm/kasan/kasan.c:459 [inline]
> > Âkasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> > Âkmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3613
> > Âkmalloc include/linux/slab.h:499 [inline]
> > Âsock_alloc_inode+0xb4/0x300 net/socket.c:253
> > Âalloc_inode+0x65/0x180 fs/inode.c:208
> > Ânew_inode_pseudo+0x69/0x190 fs/inode.c:890
> > Âsock_alloc+0x41/0x270 net/socket.c:565
> > Â__sock_create+0x148/0x850 net/socket.c:1225
> > Âsock_create net/socket.c:1301 [inline]
> > ÂSYSC_socket net/socket.c:1331 [inline]
> > ÂSyS_socket+0xeb/0x200 net/socket.c:1311
> > Âentry_SYSCALL_64_fastpath+0x1f/0x96
> >
> > Freed by task 3039:
> > Âsave_stack+0x43/0xd0 mm/kasan/kasan.c:447
> > Âset_track mm/kasan/kasan.c:459 [inline]
> > Âkasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
> > Â__cache_free mm/slab.c:3491 [inline]
> > Âkfree+0xca/0x250 mm/slab.c:3806
> > Â__rcu_reclaim kernel/rcu/rcu.h:190 [inline]
> > Ârcu_do_batch kernel/rcu/tree.c:2758 [inline]
> > Âinvoke_rcu_callbacks kernel/rcu/tree.c:3012 [inline]
> > Â__rcu_process_callbacks kernel/rcu/tree.c:2979 [inline]
> > Ârcu_process_callbacks+0xe79/0x17d0 kernel/rcu/tree.c:2996
> > Â__do_softirq+0x29d/0xbb2 kernel/softirq.c:285
>
> This looks more like a fs issue than network, my fs knowledge
> is not good enough to justify why the hell the inode could be
> destroyed before we release the fd.
>
> My _guess_ is that it is because we defer the ____fput() to a
> task work. If this is the case, then fs layer is not guilty for this.
>
> On the other hand, if we have to blame net layer, it does look
> suspicious on the RCU usage in sock_release() where we
> claim RCU protection but I don't see we hold any RCU lock
> there.

There is rcu protection for sock->wq, and the 1 in
rcu_dereference_protected(sock->wq, 1) is because we do not have a
lockdep convenient way to express that we are the last user of sock,
and about to free it.


> Also, the code that deferences sock->wq is pretty much
> useless now, at least I don't see it catches any bug though.
>
>
> diff --git a/net/socket.c b/net/socket.c
> index 42d8e9c9ccd5..b2390b5591a9 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -598,9 +598,6 @@ void sock_release(struct socket *sock)
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmodule_put(owner);
> ÂÂÂÂÂÂÂÂ}
>
> -ÂÂÂÂÂÂÂif (rcu_dereference_protected(sock->wq, 1)->fasync_list)
> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpr_err("%s: fasync list not empty!\n", __func__);
> -
>

At this point, sock->wq must be valid, and freed later (by us)

This really looks like some other bug, and a late effect.