Re: [PATCH] kdump: Fix for boot problems on SMP

From: Badari Pulavarty
Date: Wed Nov 24 2004 - 17:03:36 EST


Hari,


I have a success case and a failure case to report.

1) Success first.. I was able save /proc/vmcore when my machine
paniced (not thro sysrq) and gdb showed the stack correctly :)

For some reason, gdb failed to show stack correctly, when I
ran it on /proc/vmcore directly, when I am on kxec kernel :(

# gdb ../l*9/vmlinux vmcore.3
...
Core was generated by `root=/dev/sda2 dump init 1 memmap=exactmap
memmap=640k@0
memmap=32M@16M console='.
#0 crash_get_current_regs (regs=0xc050b000)
at arch/i386/kernel/crash_dump.c:98
98 }
(gdb) bt
#0 crash_get_current_regs (regs=0xc050b000)
at arch/i386/kernel/crash_dump.c:98
#1 0xc0139986 in __crash_machine_kexec () at kernel/crash.c:83
#2 0xc011b2aa in panic (fmt=0xc050b000 "") at
include/linux/crash_dump.h:21
#3 0xc0104ed5 in die (str=0x0, regs=0x1, err=2)
at arch/i386/kernel/traps.c:392
#4 0xc0113ad2 in do_page_fault (regs=0xd4937edc, error_code=2)
at arch/i386/mm/fault.c:480
#5 0xc0104707 in error_code () at /tmp/ccK5IM1b.s:2135
#6 0xc017a55e in aio_put_req (req=0x0) at fs/aio.c:529
#7 0xc017ba0d in io_submit_one (ctx=0xd46fddc0, user_iocb=0xbfffecb0,
iocb=0xf75af124) at fs/aio.c:1551
#8 0xc017baf1 in sys_io_submit (ctx_id=3226513408, nr=32,
iocbpp=0xbfffec30)
at fs/aio.c:1609
#9 0xc0103c63 in syscall_call () at /tmp/ccK5IM1b.s:1946
#10 0xc0407220 in default_exec_domain ()
(gdb) q

2) Failure case:

When I recreated the panic again, it tried to run kexec() and
ran into exception in kexec() code, and machine hung.

Here is the console output:

Unable to handle kernel NULL pointer dereference at virtual address
00000020
printing eip:
c128c044
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c128c044>] Not tainted VLI
EFLAGS: 00010086 (2.6.10-rc2-mm2kexec)
EIP is at _spin_lock_irq+0x4/0x20 <<<<<<<<<**** my original panic
eax: 00000020 ebx: c2dd77e0 ecx: c2821bb0 edx: c2821b80
esi: 00000020 edi: 00000000 ebp: c1dd9f10 esp: c1dd9f10
ds: 007b es: 007b ss: 0068
Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570)
Stack: c1dd9f2c c107a56e c1dd9f18 c1dd9f18 c2821ba0 c2dd77e0 c1dd9f70
c1dd9f54
c107ba1d c2821b80 00000000 00000000 bfffecb0 c2821b80 c2821b80
00000000
bfffec30 c1dd9fbc c107bb01 c1dd9f70 bfffecb0 00000040 bfffecb0
00000000
Call Trace:
[<c1004aaf>] show_stack+0x7f/0xa0
[<c1004c5e>] show_registers+0x15e/0x1c0
[<c1004e62>] die+0xf2/0x180
[<c1013ad2>] do_page_fault+0x3b2/0x710
[<c1004707>] error_code+0x2b/0x30
[<c107a56e>] aio_put_req+0x1e/0x90
[<c107ba1d>] io_submit_one+0x20d/0x250
[<c107bb01>] sys_io_submit+0xa1/0x110
[<c1003c63>] syscall_call+0x7/0xb
Code: fe 0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa eb e9
5d c3 90 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 fa <f0> fe
08 79 09 f3 90 80 38 00 7e f9 eb f2 5d c3 8d b6 00 00 00
<0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception
<0>kexec: opening parachute <<<<<<<<<<*** trying to kexec ?
Unable to handle kernel paging request at virtual address c30a0000
printing eip:
c1039956
*pde = 00000000
Oops: 0002 [#2]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c1039956>] Not tainted VLI
EFLAGS: 00010206 (2.6.10-rc2-mm2kexec)
EIP is at __crash_machine_kexec+0x66/0x110 <<<<<<** panic in kexec
eax: 00005400 ebx: c2003180 ecx: 000001e0 edx: 00000001
esi: c140b000 edi: c30a0000 ebp: c1dd9d98 esp: c1dd9d80
ds: 007b es: 007b ss: 0068
Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570)
Stack: c140b000 c1dd9d94 c1dd9d98 c1dd8000 c1dd9edc c12a01d5 c1dd9db4
c101b2aa
00000000 c140c380 c129e8dd c1dd9dc0 c1dd8000 c1dd9df8 c1004ed5
c129e8ce
00000001 c1dd9dcc 00000001 c1dd9edc c12a01d5 00000002 000000ff
0000000b
Call Trace:
[<c1004aaf>] show_stack+0x7f/0xa0
[<c1004c5e>] show_registers+0x15e/0x1c0
[<c1004e62>] die+0xf2/0x180
[<c1013ad2>] do_page_fault+0x3b2/0x710
[<c1004707>] error_code+0x2b/0x30
[<c101b2aa>] panic+0x5a/0x120
[<c1004ed5>] die+0x165/0x180
[<c1013ad2>] do_page_fault+0x3b2/0x710
[<c1004707>] error_code+0x2b/0x30
[<c107a56e>] aio_put_req+0x1e/0x90
[<c107ba1d>] io_submit_one+0x20d/0x250
[<c107bb01>] sys_io_submit+0xa1/0x110
[<c1003c63>] syscall_call+0x7/0xb
Code: 2a c1 be 01 00 00 00 89 35 a4 c7 40 c1 e8 03 22 fe ff 8b 0d a4 c7
40 c1 85 c9 75 6c bf 00 00 0a c3 be 00 b0 40 c1 b9 e0 01 00 00 <f3> a5
c7 04 24 80 07 0a c3 c7 44 24 04 80 b7 40 c1 c7 44 24 08
<0>Fatal exception: panic in 5 seconds




Thanks,
Badari

On Tue, 2004-11-23 at 10:15, Hariprasad Nellitheertha wrote:
> Hi Badari,
>
> Badari Pulavarty wrote:
> > More info testing results...
> >
> > gdb is not showing the stack info properly, on my saved vmcore.
> > I thought vmlinux is not matching the vmcore, so I verified that
> > vmcore and vmlinux matchup. But still no luck...
>
> I will try to recreate this using the 'sysrq' method you described in
> the earlier mail. Will let you know my findings asap.
>
> Thanks very much for trying kdump!
>
> Regards, Hari
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/