Re: WARNING in xfrm_state_fini

From: Steffen Klassert
Date: Mon Nov 27 2017 - 06:55:44 EST


On Tue, Nov 21, 2017 at 06:44:04PM -0800, Cong Wang wrote:
> On Tue, Nov 21, 2017 at 2:00 AM, syzbot
> <bot+427f0a9138719ba183c0d37d8c2d070567f7761a@xxxxxxxxxxxxxxxxxxxxxxxxx>
> wrote:
> > Hello,
> >
> > syzkaller hit the following crash on
> > c8a0739b185d11d6e2ca7ad9f5835841d1cfc765
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > C reproducer is attached
> > syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> > for information about syzkaller reproducers
> >
> >
> > Kernel panic - not syncing: panic_on_warn set ...
> >
> > CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 4.14.0+ #187
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Workqueue: netns cleanup_net
> > Call Trace:
> > __dump_stack lib/dump_stack.c:17 [inline]
> > dump_stack+0x194/0x257 lib/dump_stack.c:53
> > panic+0x1e4/0x41c kernel/panic.c:183
> > __warn+0x1dc/0x200 kernel/panic.c:547
> > report_bug+0x211/0x2d0 lib/bug.c:184
> > fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:177
> > fixup_bug arch/x86/kernel/traps.c:246 [inline]
> > do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:295
> > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:314
> > invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:926
> > RIP: 0010:xfrm_state_fini+0x46a/0x620 net/xfrm/xfrm_state.c:2323
> > RSP: 0018:ffff8801d9ce70f0 EFLAGS: 00010293
> > RAX: ffff8801d9cde580 RBX: ffff8801ccf50040 RCX: ffffffff845cb0fa
> > RDX: 0000000000000000 RSI: 1ffff1003b39bdd1 RDI: ffffed003b39ce10
> > RBP: ffff8801d9ce7248 R08: 1ffff1003b39cda4 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1003b39ce20
> > R13: ffff8801d9ce7220 R14: 1ffff1003b39ce24 R15: ffff8801ccf51500
> > xfrm_net_exit+0x25/0x30 net/xfrm/xfrm_policy.c:2957
>
> User-space uses proto==0 as a wildcard, but xfrm_id_proto_match()
> doesn't consider it as a match with IPSEC_PROTO_ANY, in this case
> it should match all. Not sure if the following patch is the best way to
> fix it, or perhaps x->id.proto should be initialized to some of these 3
> values, but looking into ->init_temprop() it is not the case.

x->id is copied from the policy template and it seems that we don't
validate the id of the template when inserting the policy. iproute2
checks for a valid IPsec proto but the kernel does not do so. I think
we should check the policy template and reject inserting if the proto
is invalid.