Re: KASAN: use-after-free Read in tcf_block_find

From: Dmitry Vyukov
Date: Thu Sep 27 2018 - 04:11:10 EST


On Thu, Sep 27, 2018 at 10:06 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Wed, Sep 26, 2018 at 11:55 PM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>>> >> Hello,
>>> >>
>>> >> syzbot found the following crash on:
>>> >>
>>> >> HEAD commit: 4b1bd6976945 net: phy: marvell: Fix build.
>>> >> git tree: net-next
>>> >> console output: https://syzkaller.appspot.com/x/log.txt?x=16f763fa400000
>>> >> kernel config: https://syzkaller.appspot.com/x/.config?x=443816db871edd66
>>> >> dashboard link: https://syzkaller.appspot.com/bug?extid=37b8770e6d5a8220a039
>>> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>> >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a5614e400000
>>> >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=141a532a400000
>>> >>
>>> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> >> Reported-by: syzbot+37b8770e6d5a8220a039@xxxxxxxxxxxxxxxxxxxxxxxxx
>>> >>
>>> >> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>>> >> 8021q: adding VLAN 0 to HW filter on device team0
>>> >> ==================================================================
>>> >> BUG: KASAN: use-after-free in tcf_block_find+0x9d1/0xb90
>>> >> net/sched/cls_api.c:646
>>> >> Read of size 4 at addr ffff8801cc126978 by task syz-executor002/5646
>>> >>
>>> >> CPU: 1 PID: 5646 Comm: syz-executor002 Not tainted 4.19.0-rc5+ #232
>>> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>> >> Google 01/01/2011
>>> >> Call Trace:
>>> >> __dump_stack lib/dump_stack.c:77 [inline]
>>> >> dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
>>> >> print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>>> >> kasan_report_error mm/kasan/report.c:354 [inline]
>>> >> kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>>> >> __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
>>> >> tcf_block_find+0x9d1/0xb90 net/sched/cls_api.c:646
>>> >
>>> > Hmm. looks like missing a rcu_dereference() here:
>>> >
>>> > if (!*parent) {
>>> > *q = dev->qdisc;
>>> > *parent = (*q)->handle;
>>> >
>>>
>>> We hold RTNL here.
>>>
>>> Have we already done the changes allowing dev->qdisc being changed
>>> without RTNL being required ?
>>
>> That transition is not completed yet, but holding RTNL doesn't help
>> any more after we now free qdisc in rcu callback.
>>
>> More interestingly, rcu read lock is already held in this context,
>> so it seems we still have to use rcu_deference() to read dev->qdisc
>> and of course mark it with __rcu too.
>
> On hardware level rcu_deference() only affects Alpha. On software
> level, it can affect all archs due to e.g. a re-read of the variable.
> Do we see the potential for compilation that would lead to the UAF
> here? A missed rcu_deference() would not be the first thing that I
> would suspect for UAF on x86. It's more likely some bolder bug.


Would a stack trace for call_rcu be helpful here? I have this idea for
a long time, but never get around to implementing it:
https://bugzilla.kernel.org/show_bug.cgi?id=198437

Also FWIW I recently used the following hack for another net bug. It
made that other bug involving call_rcu way more likely to fire. Maybe
it will be helpful here too.

diff --git a/net/core/dst.c b/net/core/dst.c
index 81ccf20e28265..591a8d0aca545 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -187,8 +187,16 @@ void dst_release(struct dst_entry *dst)
if (unlikely(newrefcnt < 0))
net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
__func__, dst, newrefcnt);
- if (!newrefcnt)
- call_rcu(&dst->rcu_head, dst_destroy_rcu);
+ if (!newrefcnt) {
+ if (lock_is_held(&rcu_bh_lock_map) ||
+ lock_is_held(&rcu_lock_map) ||
+ lock_is_held(&rcu_sched_lock_map)) {
+ call_rcu(&dst->rcu_head, dst_destroy_rcu);
+ } else {
+ synchronize_rcu();
+ dst_destroy_rcu(&dst->rcu_head);
+ }
+ }
}
}