Re: Adaptec driver crashes (3/3)

From: Andrew Morton
Date: Fri May 08 2009 - 15:11:57 EST


On Wed, 6 May 2009 16:13:53 +0900
"Norman Diamond" <n0diamond@xxxxxxxxxxx> wrote:

> A tougher non-100%-reproducible way to crash a Linux system is as follows.
>
> I don't remember exactly what I did, but for some reason I guessed it might
> happen a second time, so I set the console to a text mode terminal before it
> happened the second time (since Linux doesn't give Blue Screens of Death
> otherwise). This is with an Adaptec 1480 card, AIC7xxx driver.
>
> I wish I had a wooden table so I wouldn't have to read and type this stuff
> back in by hand. (In case anyone here doesn't read thedailywtf, ignore the
> part about the wooden table. I still wish I wouldn't have to read and type
> this stuff back in by hand.)
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000
> 0
> printing eip: c04a50af *pde = 00000000
> Oops: 0000 [#1] SMP
> Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_
> device snd_pcm_oss snd_mixer_oss fuse lp pcspkr snd_intel8x0 snd_ac97_codec ac97
> _bus e100 snd_pcm snd_timer snd video mii iTCO_wdt soundcore serio_raw iTCO_vend
> or_support output psmouse evdev pcmcia intel_agp agpgart shpchp snd_page_alloc p
> arport_pc parport sg yenta_socket rsrc_nonstatic pcmcia_core aufs squashfs sqlzm
> a unlzma
>
> Pid: 3531, comm: klogs Not tainted (2.6.24.3 #1)
> EIP: 0060:[<c04a50af>] EFLAGS: 00010046 CPU: 0
> EIP is at ahc_handle_scsiint+0xdbf/0xef0
> EAX: 00000000 EBX: 00000007 ECX: 00000001 EDX: 0000000d
> ESI: ede17e00 EDI: 00000000 EBP: 00000000 ESP: ed507de4
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process klogd (pid: 3531, ti=ed506000 task=edd6aaa0 task.ti=ed506000)
> Stack: 00000001 00000041 00000001 ee6a6580 d662d853 41410000 000000a0 ead93024
> c01806db 00a0ee08 00000041 00000007 00000000 00000001 00000000 00000000
> ed53b541 00000001 ede17e00 00000064 00000082 0000000b c04b20f9 ede0cd60
> Call Trace:
> [<c01806db>] __link_path_walk+0xaab/0xe10
> [<c04b20f9>] ahc_linux_isr+0x1e9/0x260
> [<c0151025>] handle_IRQ_event+0x25/0x50
> [<c01529bc>] handle_level_irq+0x7c/0xf0
> [<c010748b>] do_IRQ+0x3b/0x70
> [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
> [<c01052d3>] common_interrupt+0x23/0x30
> [<efbe3d90>] aufs_getattr+0x0/0xa0 [aufs]
> [<efbe3d9e>] aufs_getattr+0xe/0xa0 [aufs]
> [<c017fa47>] getname+0xa7/0xc0
> [<c03b7acf>] security_inode_getattr+0x1f/0x30
> [<c017a4f8>] vfs_getattr+0x48/0x70
> [<c017a727>] vfs_stat_fd+0x37/0x60
> [<c017a82f>] sys_stat64+0xf/0x30
> [<c01775ee>] vfs_write+0x11e/0x140
> [<c0177c31>] sys_write+0x41/0x70
> [<c012cc1a>] sys_time+0xa/0x30
> [<c0104352>] syscall_call+0x7/0xb
> [<c0700000>] rpcb_getport_prepare+0x10/0x40
> =======================
> Code: 24 2c e8 c5 95 ff ff b9 14 00 00 00 89 f0 8d 54 24 2c c7 44 24 04 00 00 00
> 00 c7 04 24 b6 d1 80 c0 e8 56 e9 ff ff e9 8d f8 ff ff <8b> 07 89 fa 0f b6 58 1b
> 0f b6 c3 89 44 24 1c 89 f0 e8 5b a5 00
> EIP: [<c04a50af>] ahc_handle_scsiint+0xdbf/0xef0 SS:ESP 0068:ed507de4

ahc_handle_scsiint() is a huge function. It would help if we can find
the file and line where it is crashing. If you could do the following,
please.

- Run a more recent kernel: we might have fixed it since 2.6.24!

- Enable CONFIG_DEBUG_INFO

- Reproduce the crash and note the EIP address (c04a50af in this example).

- In your kernel build source directory, do

gdb vmlinux
(gdb) l *0xc04a50af

(with a suitable value of c04a50af)

Alternatively, try doing this with your current 2.6.24 setup.

Alternatively, see if you can get the poorly-documented
scripts/markup_oops.pl to work.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/