Re: BUG: KASAN: use-after-free in fib_table_flush

From: Ido Schimmel
Date: Sun Dec 17 2017 - 11:07:29 EST


+Alexander

On Sun, Dec 17, 2017 at 08:55:57PM +0800, Fengguang Wu wrote:
> Hello,
>
> FYI this happens in mainline kernel 4.15.0-rc3.
> It looks like a new regression.
>
> It occurs in 4 out of 28 boots.
>
> [ 166.090516] ==================================================================
> [ 166.092419] BUG: KASAN: use-after-free in fib_table_flush+0x76c/0x870:
> fib_table_flush at net/ipv4/fib_trie.c:1868
> [ 166.092907] Read of size 8 at addr ffff880012fc0b18 by task kworker/u2:3/173
> [ 166.093402]
> [ 166.093528] CPU: 0 PID: 173 Comm: kworker/u2:3 Not tainted 4.15.0-rc3 #31
> [ 166.094018] Workqueue: netns cleanup_net
> [ 166.094298] Call Trace:
> [ 166.094489] print_address_description+0xa6/0x370:
> print_address_description at mm/kasan/report.c:253
> [ 166.094867] ? fib_table_flush+0x76c/0x870:
> fib_table_flush at net/ipv4/fib_trie.c:1868
> [ 166.095159] kasan_report+0x226/0x330:
> kasan_report_error at mm/kasan/report.c:352
> (inlined by) kasan_report at mm/kasan/report.c:409
> [ 166.095420] fib_table_flush+0x76c/0x870:
> fib_table_flush at net/ipv4/fib_trie.c:1868
> [ 166.095698] ? fib_table_flush_external+0x5a0/0x5a0:
> fib_table_flush at net/ipv4/fib_trie.c:1836
> [ 166.096067] ? ip_fib_net_exit+0x94/0x360:
> ip_fib_net_exit at net/ipv4/fib_frontend.c:1313 (discriminator 16)
> [ 166.096350] ip_fib_net_exit+0x228/0x360:
> ip_fib_net_exit at net/ipv4/fib_frontend.c:1316
> [ 166.096629] ? ip_fib_net_exit+0x360/0x360:
> fib_net_exit at net/ipv4/fib_frontend.c:1355
> [ 166.096930] ops_exit_list+0xa8/0x160
> [ 166.097233] cleanup_net+0x414/0x860:
> cleanup_net at net/core/net_namespace.c:483 (discriminator 9)
> [ 166.097487] ? net_drop_ns+0x80/0x80:
> cleanup_net at net/core/net_namespace.c:439
> [ 166.097748] ? kvm_sched_clock_read+0x5/0x10:
> kvm_sched_clock_read at arch/x86/kernel/kvmclock.c:101
> [ 166.098051] ? native_sched_clock_from_tsc+0x40/0x70:
> __preempt_count_dec_and_test at arch/x86/include/asm/preempt.h:91
> (inlined by) cyc2ns_read_end at arch/x86/kernel/tsc.c:81
> (inlined by) cycles_2_ns at arch/x86/kernel/tsc.c:135
> (inlined by) native_sched_clock_from_tsc at arch/x86/kernel/tsc.c:219
> [ 166.098399] ? sched_clock_cpu+0xf/0x70:
> sched_clock_cpu at kernel/sched/clock.c:363
> [ 166.098672] ? __lock_acquire+0x3b2/0x1fc0
> [ 166.099054] ? lock_downgrade+0x6a0/0x6a0:
> lock_release at kernel/locking/lockdep.c:4013
> [ 166.099337] ? lock_acquire+0x117/0x260:
> get_current at arch/x86/include/asm/current.h:15
> (inlined by) lock_acquire at kernel/locking/lockdep.c:4006
> [ 166.099609] ? process_one_work+0x70f/0x11c0:
> process_one_work at kernel/workqueue.c:2087
> [ 166.099938] process_one_work+0x791/0x11c0:
> process_one_work at kernel/workqueue.c:2118
> [ 166.100229] ? kvm_sched_clock_read+0x5/0x10:
> kvm_sched_clock_read at arch/x86/kernel/kvmclock.c:101
> [ 166.100532] ? sched_clock+0x2d/0x40:
> paravirt_sched_clock at arch/x86/include/asm/paravirt.h:174
> (inlined by) sched_clock at arch/x86/kernel/tsc.c:227
> [ 166.100792] ? cancel_delayed_work_sync+0x20/0x20:
> process_one_work at kernel/workqueue.c:2014
> [ 166.101123] worker_thread+0xe8/0x1070:
> __read_once_size at include/linux/compiler.h:183
> (inlined by) list_empty at include/linux/list.h:203
> (inlined by) worker_thread at kernel/workqueue.c:2247
> [ 166.101392] ? __kthread_parkme+0x164/0x230:
> __kthread_parkme at kernel/kthread.c:188
> [ 166.101689] ? process_one_work+0x11c0/0x11c0:
> worker_thread at kernel/workqueue.c:2189
> [ 166.102006] kthread+0x2fd/0x400:
> kthread at kernel/kthread.c:238
> [ 166.102240] ? kthread_create_on_node+0xf0/0xf0:
> kthread at kernel/kthread.c:198
> [ 166.102561] ret_from_fork+0x1f/0x30:
> ret_from_fork at arch/x86/entry/entry_64.S:447
> [ 166.102855]
> [ 166.102972] Allocated by task 1907:
> [ 166.103235] __kmalloc+0xf6/0x1a0:
> __kmalloc at mm/slub.c:3765
> [ 166.103475] fib_trie_table+0xe8/0x240:
> fib_trie_table at net/ipv4/fib_trie.c:2081
> [ 166.103748] fib_net_init+0x1bc/0x570:
> fib4_rules_init at net/ipv4/fib_frontend.c:59
> (inlined by) ip_fib_net_init at net/ipv4/fib_frontend.c:1287
> (inlined by) fib_net_init at net/ipv4/fib_frontend.c:1335
> [ 166.104032] ops_init+0x1c0/0x360:
> ops_init at net/core/net_namespace.c:119
> [ 166.104269] setup_net+0x23c/0x530:
> setup_net at net/core/net_namespace.c:296
> [ 166.104512] copy_net_ns+0x170/0x350:
> copy_net_ns at net/core/net_namespace.c:420
> [ 166.104779] create_new_namespaces+0x343/0x730:
> create_new_namespaces at kernel/nsproxy.c:107
> [ 166.105091] unshare_nsproxy_namespaces+0xa1/0x150:
> unshare_nsproxy_namespaces at kernel/nsproxy.c:206 (discriminator 4)
> [ 166.105427] SyS_unshare+0x338/0x6c0
> [ 166.105682] do_syscall_64+0x21f/0xb80:
> do_syscall_64 at arch/x86/entry/common.c:285
> [ 166.105954] return_from_SYSCALL_64+0x0/0x65:
> return_from_SYSCALL_64 at arch/x86/entry/entry_64.S:259
> [ 166.106253]
> [ 166.106367] Freed by task 11:
> [ 166.106581] kfree+0x102/0x1d0:
> slab_free at mm/slub.c:2973
> (inlined by) kfree at mm/slub.c:3899
> [ 166.106838] rcu_do_batch+0x331/0x7f0:
> rcu_lock_release at include/linux/rcupdate.h:249
> (inlined by) __rcu_reclaim at kernel/rcu/rcu.h:196
> (inlined by) rcu_do_batch at kernel/rcu/tree.c:2758
> [ 166.107102] rcu_cpu_kthread+0x12a/0x160:
> rcu_preempt_do_callbacks at kernel/rcu/tree_plugin.h:687
> (inlined by) rcu_kthread_do_work at kernel/rcu/tree_plugin.h:1142
> (inlined by) rcu_cpu_kthread at kernel/rcu/tree_plugin.h:1184
> [ 166.107381] smpboot_thread_fn+0x3c1/0x820:
> smpboot_thread_fn at kernel/smpboot.c:164
> [ 166.107669] kthread+0x2fd/0x400:
> kthread at kernel/kthread.c:238
> [ 166.107928] ret_from_fork+0x1f/0x30:
> ret_from_fork at arch/x86/entry/entry_64.S:447
> [ 166.108181]
> [ 166.108295] The buggy address belongs to the object at ffff880012fc0ae0
> [ 166.108295] which belongs to the cache kmalloc-64 of size 64
> [ 166.109179] The buggy address is located 56 bytes inside of
> [ 166.109179] 64-byte region [ffff880012fc0ae0, ffff880012fc0b20)

Hi Alexander,

Note that CONFIG_IP_MULTIPLE_TABLES is disabled, so both the main and
local table are allocated during init and also share the same trie.

I think that what happens is that ip_fib_net_exit() frees the main table
and its trie via an RCU callback which is scheduled before the local
table is iterated over, thus resulting in a use-after-free.

I can reliably trigger the bug by adding synchronize_rcu() at the end of
each iteration of the loop.

Problem goes away if we iterate over the tables in reverse order which
is symmetric to fib4_rules_init().

What do you think?