Panic in nf_ct_seq_adjust (be2net, synproxy, heavy load)

From: Grzegorz Nosek
Date: Tue Feb 16 2016 - 14:14:49 EST


Hi,

I have finally managed to capture a stack trace from an intermittent crash we've been observing under heavy load.

There are two identical machines w/dual-port Emulex 10 Gbps NICs running in an active-passive firewall cluster (though more like passive-passive in reality). As far as I know, at the time of the reboot there was a pretty large DoS attack (6 Gbps) on the same network segment, not directed at the firewalls themselves. The firewalls were being tested with a ~80Mbps stream of small HTTP requests forwarded with SYNPROXY, conntrackd and some traffic shaping enabled.

Unfortunately, this trace comes from Debian distro kernel (3.16), as I haven't yet upgraded this machine and I don't have serial console on the other one at the moment. The other node is running 4.4.1 and rebooted at the same time +/- seconds, /proc/uptime says it's up 15 seconds longer. netconsole stays silent during reboots.

[436190.637310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[436190.674698] IP: [<ffffffffa014af73>] nf_ct_seq_adjust+0xa3/0x3a0 [nf_conntrack]
[436190.709683] PGD 0
[436190.719785] Oops: 0000 [#1] SMP
[436190.735431] Modules linked in: ipt_SYNPROXY nf_synproxy_core binfmt_misc xt_CLASSIFY xt_mac iptable_filter macvlan ip_set_hash_net cls_u32 sch_red sch_htb nf_conntrac
k_netlink xt_set ip_set nfnetlink iptable_mangle xt_tcpudp xt_CT xt_conntrack iptable_raw ip_tables arptable_filter arp_tables x_tables intel_powerclamp coretemp kvm_inte
l kvm crc32_pclmul aesni_intel aes_x86_64 joydev lrw gf128mul glue_helper ttm drm_kms_helper hpwdt drm i2c_algo_bit i2c_core hpilo psmouse evdev iTCO_wdt acpi_power_meter
pcspkr serio_raw iTCO_vendor_support lpc_ich ablk_helper cryptd shpchp i7core_edac nf_conntrack_ftp mfd_core button edac_core nf_conntrack_ipv4 nf_defrag_ipv4 processor
nf_conntrack ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 mbcache jbd2 dm_mod hid_generic usbhid hid sd_mod crc_t10dif crct10dif_generic sg crct1
0dif_pclmul crct10dif_common crc32c_intel be2iscsi ehci_pci uhci_hcd iscsi_boot_sysfs libiscsi ehci_hcd usbcore scsi_transport_iscsi hpsa usb_common be2net scsi_mod therm
al vxlan thermal_sys
[436191.160517] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G I 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u2
[436191.208587] Hardware name: HP ProLiant BL460c G7, BIOS I27 07/02/2013
[436191.239665] task: ffff88020af6c110 ti: ffff88020af78000 task.ti: ffff88020af78000
[436191.274842] RIP: 0010:[<ffffffffa014af73>] [<ffffffffa014af73>] nf_ct_seq_adjust+0xa3/0x3a0 [nf_conntrack]
[436191.321186] RSP: 0018:ffff88020ba43c40 EFLAGS: 00010246
[436191.346218] RAX: 0000000000030003 RBX: ffff880202931300 RCX: 0000000000000003
[436191.381252] RDX: 00000000ff93df09 RSI: 0000000000000028 RDI: ffff8802064bdb0c
[436191.414649] RBP: ffff8802064bdb08 R08: ffffffff81459ea0 R09: 0000000000000020
[436191.449511] R10: ffff880203600000 R11: ffff8800e7b38000 R12: 0000000000000028
[436191.482769] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8801e21b3062
[436191.516924] FS: 0000000000000000(0000) GS:ffff88020ba40000(0000) knlGS:0000000000000000
[436191.554343] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[436191.581010] CR2: 0000000000000004 CR3: 0000000001813000 CR4: 00000000000007e0
[436191.613999] Stack:
[436191.624137] ffffffff00000014 ffffc9000d8c1800 ffff8802064bdb0c ffffffff00000000
[436191.660568] ffff8801f92b0af0 0000000000000000 ffff880202931300 ffff8802064bdb08
[436191.696262] ffff880203600000 0000000000000000 ffff880202931300 ffffffff818e95a0
[436191.730498] Call Trace:
[436191.742505] <IRQ>
[436191.752229] [<ffffffffa0206bd3>] ? ipv4_confirm+0xa3/0xf0 [nf_conntrack_ipv4]
[436191.786089] [<ffffffff81459ea0>] ? ip_fragment+0x880/0x880
[436191.813367] [<ffffffff8144f575>] ? nf_iterate+0x65/0xa0
[436191.838356] [<ffffffff81459ea0>] ? ip_fragment+0x880/0x880
[436191.864832] [<ffffffff8144f626>] ? nf_hook_slow+0x76/0x130
[436191.891480] [<ffffffff81459ea0>] ? ip_fragment+0x880/0x880
[436191.917857] [<ffffffff8145ba42>] ? ip_output+0x82/0x90
[436191.943454] [<ffffffff8141ebc3>] ? __netif_receive_skb_core+0x543/0x750
[436191.974406] [<ffffffff8141ee4f>] ? netif_receive_skb_internal+0x1f/0x80
[436192.006268] [<ffffffff8141f76a>] ? napi_gro_frags+0x1ba/0x2c0
[436192.032907] [<ffffffffa00dc2f4>] ? be_process_rx+0x2c4/0x780 [be2net]
[436192.064273] [<ffffffffa00dc9a2>] ? be_poll+0x1f2/0x400 [be2net]
[436192.091625] [<ffffffff8141f1d0>] ? net_rx_action+0x140/0x240
[436192.119082] [<ffffffff8106c681>] ? __do_softirq+0xf1/0x290
[436192.144594] [<ffffffff8106ca55>] ? irq_exit+0x95/0xa0
[436192.168465] [<ffffffff81516ae2>] ? do_IRQ+0x52/0xe0
[436192.192353] [<ffffffff8151492d>] ? common_interrupt+0x6d/0x6d
[436192.219491] <EOI>
[436192.228773] [<ffffffff8108aded>] ? __hrtimer_start_range_ns+0x1cd/0x390
[436192.261747] [<ffffffff813dfbbf>] ? cpuidle_enter_state+0x4f/0xc0
[436192.290427] [<ffffffff813dfbb8>] ? cpuidle_enter_state+0x48/0xc0
[436192.319838] [<ffffffff810a80d8>] ? cpu_startup_entry+0x2f8/0x400
[436192.348103] [<ffffffff81042c5f>] ? start_secondary+0x20f/0x2d0
[436192.376655] Code: 00 00 00 88 44 24 08 83 e0 01 48 8d 14 00 48 01 d0 4d 8d 74 85 00 48 8d 45 04 48 89 c7 48 89 44 24 10 e8 a1 88 3c e1 41 8b 57 04 <41> 8b 76 04 45 8b 46 08 41 8b 7f 08 89 d1 0f c9 41 39 0e 0f 88
[436192.464643] RIP [<ffffffffa014af73>] nf_ct_seq_adjust+0xa3/0x3a0 [nf_conntrack]
[436192.499548] RSP <ffff88020ba43c40>
[436192.515631] CR2: 0000000000000004
[436192.531367] ---[ end trace 84529a434dcea659 ]---
[436192.531372] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[436192.531380] IP: [<ffffffffa014af73>] nf_ct_seq_adjust+0xa3/0x3a0 [nf_conntrack]
[436192.531381] PGD 0
[436192.531382] Oops: 0000 [#2] SMP
[436192.531402] Modules linked in: ipt_SYNPROXY nf_synproxy_core binfmt_misc xt_CLASSIFY xt_mac iptable_filter macvlan ip_set_hash_net cls_u32 sch_red sch_htb nf_conntrack_netlink xt_set ip_set nfnetlink iptable_mangle xt_tcpudp xt_CT xt_conntrack iptable_raw ip_tables arptable_filter arp_tables x_tables intel_powerclamp coretemp kvm_intel kvm crc32_pclmul aesni_intel aes_x86_64 joydev lrw gf128mul glue_helper ttm drm_kms_helper hpwdt drm i2c_algo_bit i2c_core hpilo psmouse evdev iTCO_wdt acpi_power_meter pcspkr serio_raw iTCO_vendor_support lpc_ich ablk_helper cryptd shpchp i7core_edac nf_conntrack_ftp mfd_core button edac_core nf_conntrack_ipv4 nf_defrag_ipv4 processor nf_conntrack ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 ext4 crc16 mbcache jbd2 dm_mod hid_generic usbhid hid sd_mod crc_t10dif crct10dif_generic sg crct10dif_pclmul crct10dif_common crc32c_intel be2iscsi ehci_pci uhci_hcd iscsi_boot_sysfs libiscsi ehci_hcd usbcore scsi_transport_iscsi hpsa usb_commo
n be2net scsi_mod thermal vxlan thermal_sys
[436192.531411] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D I 3.16.0-4-amd64 #1 Debian 3.16.7-ckt20-1+deb8u2
[436192.531411] Hardware name: HP ProLiant BL460c G7, BIOS I27 07/02/2013
[436192.531412] task: ffff88020af6d3b0 ti: ffff88020af70000 task.ti: ffff88020af70000
[436192.531416] RIP: 0010:[<ffffffffa014af73>] [<ffffffffa014af73>] nf_ct_seq_adjust+0xa3/0x3a0 [nf_conntrack]
[436192.531416] RSP: 0018:ffff88010bc23ca0 EFLAGS: 00010246
[436192.531417] RAX: 0000000000030003 RBX: ffff8800371f0800 RCX: 0000000000000003
[436192.531418] RDX: 000000001a710bff RSI: 0000000000000028 RDI: ffff8801f913ac44
[436192.531418] RBP: ffff8801f913ac40 R08: ffffffff81459ea0 R09: 0000000000000020
[436192.531419] R10: ffff880203600000 R11: ffff8800e7b38000 R12: 0000000000000028
[436192.531419] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88010b84b862
[436192.531420] FS: 0000000000000000(0000) GS:ffff88010bc20000(0000) knlGS:000000

The kernel log ends here, even though it looks like we could be expecting another identical or similar trace on another CPU (#2 vs #4 earlier).

The issue repeats intermittently but I cannot reliably reproduce it (except that DoS attacks tend to bring the whole thing down). I don't know the nature of the attack, but the machines have easily survived 9 Gbps of iperf tests.

Any help is greatly appreciated while I dig into the code on my own.

Best regards,
Grzegorz Nosek