Re: Kernel oops in 2.6.18.3 with RAID5

From: Andrew Robinson
Date: Wed Feb 21 2007 - 10:45:43 EST

Next message: Alan: "Re: [PATCH 2.6.21-rc1] serial: serial_txx9 driver update"
Previous message: Peter Zijlstra: "[PATCH 06/29] mm: __GFP_EMERGENCY"
In reply to: Andrew Robinson: "Re: Kernel oops in 2.6.18.3 with RAID5"
Next in thread: Jan Engelhardt: "Re: Kernel oops in 2.6.18.3 with RAID5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Update: I think that you can ignore this error. I am getting
segmentation faults when I attempt to rebuild the kernel. This is
exactly the same problem I had with slackware 10.1 with the 2.6.10
kernel. So I think it is a hardware issue. Memtest86 didn't show any
errors after 35 passes, so I'll have to check the CPU and motherboard.

Thanks for anyone who spent any time thinking/researching this.

On 2/20/07, Andrew Robinson <andrew.rw.robinson@xxxxxxxxx> wrote:

Here is the full dmesg log of the crash:

iret exception: 0000 [#1]
SMP
Modules linked in: ppdev lp button ac battery ipv6 dm_snapshot
dm_mirror dm_mod loop tsdev rtc psmouse parport_pc parport floppy
serio_raw pcspkr i2c_nforce2 snd_intel8x0 snd_ac97_codec snd_
ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc shpchp
pci_hotplug i2c_core nvidia_agp agpgart evdev ext3 jbd mbcache raid456
md_mod xor ide_cd cdrom ide_disk sd_mod generic 8139too
amd74xx ide_core sata_sil 8139cp mii sata_nv libata scsi_mod forcedeth
ehci_hcd ohci_hcd usbcore thermal processor fan
CPU: 0
EIP: 0060:[<de95b0cb>] Not tainted VLI
EFLAGS: 00000216 (2.6.18-3-686 #1)
EIP is at copy_data+0xff/0x14b [raid456]
eax: ddcce000 ebx: 00001000 ecx: 0000000f edx: c1f71000
esi: ddccefc4 edi: c1f71fc4 ebp: 00000000 esp: dd261e4c
ds: 007b es: 007b ss: 0068
Process md0_raid5 (pid: 1115, ti=dd260000 task=dd0ed550 task.ti=dd260000)
Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000 00001000 c1f71000 00000000
00000000 00000000 00000000 dd20c388 c1e377a0 dd20c354 de95d96d 0c649510
00000000 c0116d0a 06323c4f 00000000 dd20c384 00000000 c13c52e0 0000e000
Call Trace:
[<de95d96d>] handle_stripe+0x10da/0x2075 [raid456]
[<c0116d0a>] find_busiest_group+0x177/0x46a
[<c011669e>] __wake_up+0x2a/0x3d
[<de95a72c>] __release_stripe+0x10c/0x110 [raid456]
[<de95a751>] release_stripe+0x21/0x2e [raid456]
[<de95ea15>] raid5d+0x10d/0x132 [raid456]
[<de92d769>] md_thread+0xd7/0xed [md_mod]
[<c012d961>] autoremove_wake_function+0x0/0x2d
[<de92d692>] md_thread+0x0/0xed [md_mod]
[<c012d893>] kthread+0xc2/0xef
[<c012d7d1>] kthread+0x0/0xef
[<c0101005>] kernel_thread_helper+0x5/0xb
Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18 8d 34 32 89 34
24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89 c6 c1 e9 02 <f3>
a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00
00 00 e8
EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456] SS:ESP 0068:dd261e4c
<6>note: md0_raid5[1115] exited with preempt_count 1

I was having instability with this machine before (slackware 10.1 with
2.6.10 kernel) while compiling code (especially the kernel). I just
rebuilt is as a debian box. It never died in the raid array code
before though, just in gcc.

I have tested the machine's ram with memtest86 (3 passes) and will
more thoroughly check it tonight. Besides bad RAM, does anyone have
any other ideas on what may be causing the issue?

On 2/20/07, Andrew Robinson wrote:
> I can't seem to find sufficient information on what may have caused an
> oops. I am running a debian machine using kernel 2.6.18.3. Here is
> detailed information on the system:
>
> debian etch
> CPU: AMD athlon 2100+
> kernel package: linux-image-2.6.18-3-686
> raid5 array: 3 active, 1 spare on md0
> raid fs: ext3
> raid is physically across 2 on-board NVidia SATA ports and 2 ports
> from a SATA controller card
>
> I am at work, and this was a home computer. This is what I got from
> syslog when in SSH before it died:
>
> bserver kernel: iret exception: 0000 [#1]
> bserver kernel: SMP
> bserver kernel: CPU: 0
> bserver kernel: EIP is at copy_data+0xff/0x14b [raid456]
> bserver kernel: eax: ddcce000 ebx: 00001000 ecx: 0000000f edx: c1f71000
> bserver kernel: esi: ddccefc4 edi: c1f71fc4 ebp: 00000000 esp: dd261e4c
> bserver kernel: ds: 007b es: 007b ss: 0068
> bserver kernel: Process md0_raid5 (pid: 1115, ti=dd260000
> task=dd0ed550 task.ti=dd260000)
> bserver kernel: Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000
> 00001000 c1f71000 00000000
> bserver kernel: 00000000 00000000 00000000 dd20c388 c1e377a0
> dd20c354 de95d96d 0c649510
> bserver kernel: 00000000 c0116d0a 06323c4f 00000000 dd20c384
> 00000000 c13c52e0 0000e000
> bserver kernel: Call Trace:
> bserver kernel: Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18
> 8d 34 32 89 34 24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89
> c6 c1 e9 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00 00
> 00 e8
> bserver kernel: EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456]
> SS:ESP 0068:dd261e4c
>
> The only similar message chain that I could find was about 2.6.19 and
> they recommended disabling preempting, but debian's 2.6.18.3 already
> has that disabled by default.
>
> Any ideas?
>
> Thanks,
> Andrew
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan: "Re: [PATCH 2.6.21-rc1] serial: serial_txx9 driver update"
Previous message: Peter Zijlstra: "[PATCH 06/29] mm: __GFP_EMERGENCY"
In reply to: Andrew Robinson: "Re: Kernel oops in 2.6.18.3 with RAID5"
Next in thread: Jan Engelhardt: "Re: Kernel oops in 2.6.18.3 with RAID5"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]