Re: Kernel oops in 2.6.18.3 with RAID5

From: Andrew Robinson
Date: Tue Feb 20 2007 - 20:15:06 EST


Here is the full dmesg log of the crash:

iret exception: 0000 [#1]
SMP
Modules linked in: ppdev lp button ac battery ipv6 dm_snapshot
dm_mirror dm_mod loop tsdev rtc psmouse parport_pc parport floppy
serio_raw pcspkr i2c_nforce2 snd_intel8x0 snd_ac97_codec snd_
ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc shpchp
pci_hotplug i2c_core nvidia_agp agpgart evdev ext3 jbd mbcache raid456
md_mod xor ide_cd cdrom ide_disk sd_mod generic 8139too
amd74xx ide_core sata_sil 8139cp mii sata_nv libata scsi_mod forcedeth
ehci_hcd ohci_hcd usbcore thermal processor fan
CPU: 0
EIP: 0060:[<de95b0cb>] Not tainted VLI
EFLAGS: 00000216 (2.6.18-3-686 #1)
EIP is at copy_data+0xff/0x14b [raid456]
eax: ddcce000 ebx: 00001000 ecx: 0000000f edx: c1f71000
esi: ddccefc4 edi: c1f71fc4 ebp: 00000000 esp: dd261e4c
ds: 007b es: 007b ss: 0068
Process md0_raid5 (pid: 1115, ti=dd260000 task=dd0ed550 task.ti=dd260000)
Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000 00001000 c1f71000 00000000
00000000 00000000 00000000 dd20c388 c1e377a0 dd20c354 de95d96d 0c649510
00000000 c0116d0a 06323c4f 00000000 dd20c384 00000000 c13c52e0 0000e000
Call Trace:
[<de95d96d>] handle_stripe+0x10da/0x2075 [raid456]
[<c0116d0a>] find_busiest_group+0x177/0x46a
[<c011669e>] __wake_up+0x2a/0x3d
[<de95a72c>] __release_stripe+0x10c/0x110 [raid456]
[<de95a751>] release_stripe+0x21/0x2e [raid456]
[<de95ea15>] raid5d+0x10d/0x132 [raid456]
[<de92d769>] md_thread+0xd7/0xed [md_mod]
[<c012d961>] autoremove_wake_function+0x0/0x2d
[<de92d692>] md_thread+0x0/0xed [md_mod]
[<c012d893>] kthread+0xc2/0xef
[<c012d7d1>] kthread+0x0/0xef
[<c0101005>] kernel_thread_helper+0x5/0xb
Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18 8d 34 32 89 34
24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89 c6 c1 e9 02 <f3>
a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00
00 00 e8
EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456] SS:ESP 0068:dd261e4c
<6>note: md0_raid5[1115] exited with preempt_count 1

I was having instability with this machine before (slackware 10.1 with
2.6.10 kernel) while compiling code (especially the kernel). I just
rebuilt is as a debian box. It never died in the raid array code
before though, just in gcc.

I have tested the machine's ram with memtest86 (3 passes) and will
more thoroughly check it tonight. Besides bad RAM, does anyone have
any other ideas on what may be causing the issue?


On 2/20/07, Andrew Robinson wrote:
I can't seem to find sufficient information on what may have caused an
oops. I am running a debian machine using kernel 2.6.18.3. Here is
detailed information on the system:

debian etch
CPU: AMD athlon 2100+
kernel package: linux-image-2.6.18-3-686
raid5 array: 3 active, 1 spare on md0
raid fs: ext3
raid is physically across 2 on-board NVidia SATA ports and 2 ports
from a SATA controller card

I am at work, and this was a home computer. This is what I got from
syslog when in SSH before it died:

bserver kernel: iret exception: 0000 [#1]
bserver kernel: SMP
bserver kernel: CPU: 0
bserver kernel: EIP is at copy_data+0xff/0x14b [raid456]
bserver kernel: eax: ddcce000 ebx: 00001000 ecx: 0000000f edx: c1f71000
bserver kernel: esi: ddccefc4 edi: c1f71fc4 ebp: 00000000 esp: dd261e4c
bserver kernel: ds: 007b es: 007b ss: 0068
bserver kernel: Process md0_raid5 (pid: 1115, ti=dd260000
task=dd0ed550 task.ti=dd260000)
bserver kernel: Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000
00001000 c1f71000 00000000
bserver kernel: 00000000 00000000 00000000 dd20c388 c1e377a0
dd20c354 de95d96d 0c649510
bserver kernel: 00000000 c0116d0a 06323c4f 00000000 dd20c384
00000000 c13c52e0 0000e000
bserver kernel: Call Trace:
bserver kernel: Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18
8d 34 32 89 34 24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89
c6 c1 e9 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00 00
00 e8
bserver kernel: EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456]
SS:ESP 0068:dd261e4c

The only similar message chain that I could find was about 2.6.19 and
they recommended disabling preempting, but debian's 2.6.18.3 already
has that disabled by default.

Any ideas?

Thanks,
Andrew

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/