Re: hpsa driver bug crack kernel down!

From: Bjorn Helgaas
Date: Thu Apr 10 2014 - 12:02:54 EST


[+cc Steve and iss_storagedev, remove "storagedev" which bounced
(apparent typo)]

On Thu, Apr 10, 2014 at 9:43 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> On Tue, Apr 8, 2014 at 8:39 PM, Baoquan He <bhe@xxxxxxxxxx> wrote:
>> Hi,
>>
>> The kernel is 3.14.0+ which is pulled just now.
>>
>>
>> [ 18.402695] systemd[1]: Set hostname to
>> <hp-sl4545g7-01.rhts.eng.bos.redhat.com>.
>> [ 18.408456] random: systemd urandom read with 70 bits of entropy
>> available
>> [ 18md[1]: Expecting device
>> dev-mapper-rhel_hp\x2d\x2dsl4545g7\x2d\x2d01\x2droot.device...
>> Expecting device
>> dev-mapper-rhel_hp\x2d\x2dsl4545g7\...droot.device...
>> [ 18.860704] systemd[1]: Starting -.slice.
>> [ OK ] Created slice -.slice.
>> [ 18.866030] systemd[1]: Created slice -.slice.
>> [ 18.869466] systemd[1]: Starting System Slice.
>> [ OK ] Created slice System Sl 18.939116] systemd[1]: Created
>> slice System Slice.
>> [ 18.976213] systemd[1]: Starting Slices.
>> [ OK ] Reached target Slices.
>> [ 18.981154] systemd[1]: Reached target Slices.
>> [ 18.984183] systemd[1]: Starting Timers.
>> [ OK ] Reached target Timers.
>> [ 18.989161] systemd[1]: Reached target Timers.
>> [ 18.992004] systemd[1]: Starting Journal Socket.
>> [ OK ] Listening on Journal Socket.
>> [ 18.997174] systemd[1]: Listening on Journal Socket.
>> [ 19.000702] systemd[1]: Starting dracut cmdline hook...
>> Starting dracut cmdline hook...
>> [ 19.006697] systemd[1]: Started Load KernModules.
>> [ 19.110408] systemd[1]: Starting Setup Virtual Console...
>> Starting Setup Virtual Console...
>> [ 19.116652] systemd[1]: Starting Journal Service...
>> Starting Journal Service...
>> [ OK ] Started Journal Service.
>> [ 19.127172] systemd[1]: Started Journal Service.
>> [ OK ] Listening on udev Kernel Socket.
>> [ 19.141504] systemd-journald[281]: Vac[ OK ] Listening on udev
>> Control Socket.
>> [ OK ] Reached target Sockets.
>> Starting Create list of required static device nodes...rrent
>> kernel...
>> Starting Apply Kernel Variables...
>> [ OK ] Reached target Swap.
>> [ OK ] Reached target Local File Systems.
>> [ OK ] Started dracut cmdline hook.
>> [ OK ] Started Setup Virtual Console.
>> [ OK ] Started Apply Kernel Variables.
>> [ OK ] Started Create list of required static device nodes ...current
>> kernel.
>> Starting Create static device nodes in /dev...
>> Starting dracut pre-udev hook...
>> [ OK ] Started Create static device nodes in /dev.
>> [ 20.247819] device-mapper: uevent: version 1.0.3
>> [ 20.251101] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30)
>> initialised: dm-devel@xxxxxxxxxx
>> [ OK ] Started dracut pre-udev hook.
>> Starting udev Kernel Device Manager...
>> [ 20.322923] systemd-udevd[335]: starting version 208
>> [ OK ] Started udev Kernel Device Manager.
>> Starting udev Coldplug all Devices...
>> Mounting Configuration File System...
>> [ OK ] Mounted Configuration File System.
>> [ OK ] Started udev Coldplug all Devices.
>> Starting dracut initqueue hook...
>> [ OK ][1] HP HPSA Driver (v 3.4.4-1)
>> [ 20.832850] hpsa 0000:05:00.0: can't disable ASPM; OS doesn't have
>> ASPM control
>> Reached target System Initialization.
>> [ 20.875178] ACPI: PCI Interrupt Link [I0C0] enabled at IRQ 36
>> [ 20.909000] hpsa 0000:05:00.0: MSIX
>> [ 20.911586] hpsa 0000:05:00.0: Logical aborts not supported
>> [ 20.916004] [drm] Initialized drm 1.1.0 20060810
>> [ 20.936139] hpsa 0000:05:00.0: hpsa0: <0x323b> at IRQ 73 using DAC
>> [ 20.956967] BUG: unable to handle kernel NULL pointer dereference at
>> (null)
>> [ 20.956997] IP: [<ffffffffa004b97f>]
>> hpsa_enter_performant_mode+0x4ff/0x580 [hpsa]
>> [ 20.957003] PGD 0
>> [ 20.957012] Oops: 0002 [#1] SMP
>> [ 20.957035] Modules linked in: drm(+) libata hpsa(+) i2c_core
>> dm_mirror dm_region_hash dm_log dm_mod
>> [ 20.957046] CPU: 10 PID: 341 Comm: systemd-udevd Not tainted 3.14.0+
>> #28
>> [ 20.957049] Hardware name: HP ProLiant SL4545 G7/, BIOS A31
>> 12/08/2012
>> [ 20.957055] task: ffff880824191b40 ti: ffff88082309c000 task.ti:
>> ffff88082309c000
>> [ 20.957078] RIP: 0010:[<ffffffffa004b97f>] [<ffffffffa004b97f>]
>> hpsa_enter_performant_mode+0x4ff/0x580 [hpsa]
>> [ 20.957083] RSP: 0018:ffff88082309da18 EFLAGS: 00010297
>> [ 20.957088] RAX: 0000000000000000 RBX: 000000007c000167 RCX:
>> 0000000000000004
>> [ 20.957091] RDX: 000000000000
>
> What happened with this original report? This looks like a different
> problem than the DMA fault reported by Davidlohr. I'd start by
> disassembling the hpsa module and matching the IP to a line.
> Documentation/oops-tracing.txt might have useful tips on how to do
> that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/