BUG: unable to handle kernel NULL pointer deref, bisected to 746650160

From: Torsten Luettgert
Date: Wed Apr 08 2015 - 12:57:49 EST


Hello,

I'm getting NULL pointer deref BUGs on a Supermicro machine of
mine since 3.17. It occurs at random uptimes, often a few hours
after booting (max uptime was 2 days yet).

I bisected the problem (took a while); the problematic commit seems
to be 746650160866 (scsi: convert host_busy to atomic_t) by
Christoph Hellwig.

Here's one of the logs (it's always the same trace):

BUG: unable to handle kernel NULL pointer dereference at
0000000000000010 IP: [<ffffffff8133af60>]
swiotlb_unmap_sg_attrs+0x30/0x80 PGD 0
Oops: 0000 [#1] SMP
Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich mfd_core
usb_storage CPU: 0 PID: 0 Comm: swapper/0 Not tainted
3.16.0-74665016086615bb+ #1 Hardware name: Supermicro X8DTT/X8DTT, BIOS
080016 10/05/2010 task: ffffffff81c16480 ti: ffffffff81c00000 task.ti:
ffffffff81c00000 RIP: 0010:[<ffffffff8133af60>] [<ffffffff8133af60>]
swiotlb_unmap_sg_attrs+0x30/0x80 RSP: 0018:ffff88063fc03e08 EFLAGS:
00010002 RAX: 0000000000000000 RBX: 0000000000000001 RCX:
0000000000000002 RDX: 0000000000000000 RSI: 000000090e2ef000 RDI:
ffff880c14e61a00 RBP: ffff88063fc03e38 R08: 0000000000000000 R09:
ffff8806209cc098 R10: ffff88063f400120 R11: 0000000000001268 R12:
0000000000000002 R13: 0000000000000002 R14: ffff8806209cc098 R15:
ffff880c200fcc70 FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000001c11000 CR4: 00000000000027e0
Stack:
0000000000000094 0000000000000094 ffff880c200f8718 0000000000000094
0000000000000094 0000000000000094 ffff88063fc03e48 ffffffff8146a0b4
ffff88063fc03e88 ffffffff81477c1d ffff88063fc03e78 ffff880c213a57c0
Call Trace:
<IRQ>
[<ffffffff8146a0b4>] scsi_dma_unmap+0x54/0x70
[<ffffffff81477c1d>] twl_interrupt+0x26d/0x420
[<ffffffff810fe2fd>] handle_irq_event_percpu+0x5d/0x1c0
[<ffffffff810fe4a2>] handle_irq_event+0x42/0x70
[<ffffffff8110165b>] handle_fasteoi_irq+0x5b/0x100
[<ffffffff81053fdc>] handle_irq+0x5c/0x150
[<ffffffff810c8f72>] ? __atomic_notifier_call_chain+0x12/0x20
[<ffffffff810c8f96>] ? atomic_notifier_call_chain+0x16/0x20
[<ffffffff81776f6e>] do_IRQ+0x5e/0x110
[<ffffffff817754ea>] common_interrupt+0x6a/0x6a
<EOI>
[<ffffffff815de8c3>] ? cpuidle_enter_state+0x53/0xd0
[<ffffffff815de8bf>] ? cpuidle_enter_state+0x4f/0xd0
[<ffffffff815de957>] cpuidle_enter+0x17/0x20
[<ffffffff810e95a4>] cpuidle_idle_call+0xc4/0x250
[<ffffffff810e9855>] cpu_idle_loop+0x125/0x1d0
[<ffffffff810e9913>] cpu_startup_entry+0x13/0x20
[<ffffffff81769597>] rest_init+0x77/0x80
[<ffffffff81d74344>] start_kernel+0x39a/0x3a1
[<ffffffff81d73dc8>] ? set_init_arg+0x5d/0x5d
[<ffffffff8176f1ad>] ? memblock_reserve+0x4c/0x51
[<ffffffff81d735ad>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff81d736f0>] x86_64_start_kernel+0x141/0x148
Code: 56 49 89 fe 41 55 41 89 cd 41 54 41 89 d4 53 48 83 ec 10 83 f9 03
74 5e 31 db 85 d2 48 89 f0 7e 48 66 2e 0f 1f 84 00 00 00 00 00 <48> 8b
70 10 48 3b 35 d5 16 e0 00 8b 50 18 72 1e 48 3b 35 d1 16 RIP
[<ffffffff8133af60>] swiotlb_unmap_sg_attrs+0x30/0x80 RSP
<ffff88063fc03e08> CR2: 0000000000000010 ---[ end trace
4e21be7f8b16aadd ]---

The same problem was reported by Kui Zhang last october with the
subject "3.17.0-rc7 kernel NULL pointer dereference (3ware 9650SE)".
Regrettably (for me), nobody replied.

We have a 3ware controller, too, but ours is a 9750. Controller
firmware and BIOS are current.

Any help with this is greatly appreciated.

Regards,
Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/