crash on x86_64 - mm related?

From: Ryan Richter
Date: Tue Nov 29 2005 - 10:43:46 EST


Hi, I booted 2.6.14.2 with the MPT fusion performance fix patch about a
week ago on my file server. The machine crashed lat night while it was
doing backups. You can see the voluminous kernel output below.

Someone else recently had seemingly the same thing happen, but didn't
think it was a kernel problem. You can read about it here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=338335

I will reply later today with the kernel .config, right now I have to
wait for someone to reboot the machine first.

Any help would be appreciated,
-ryan

Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8)
flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0
Backtrace:

Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a965>{free_hot_cold_page+101}
<ffffffff80162007>{__page_cache_release+151} <ffffffff802b8fe8>{sgl_unmap_user_pages+120}
<ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
<ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
<ffffffff8010db7a>{system_call+126}
Trying to fix it up, but a reboot is needed
Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8)
flags:0x010000000000081c mapping:ffff81005c0fc310 mapcount:0 count:0
Backtrace:

Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a965>{free_hot_cold_page+101}
<ffffffff80162007>{__page_cache_release+151} <ffffffff802b8fe8>{sgl_unmap
_user_pages+120}
<ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
<ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
<ffffffff8010db7a>{system_call+126}
Trying to fix it up, but a reboot is needed
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at include/linux/mm.h:341
invalid operand: 0000 [1] SMP
CPU 1
Modules linked in: bonding
Pid: 2418, comm: taper Tainted: G B 2.6.14.2 #1
RIP: 0010:[<ffffffff802b8fcd>] <ffffffff802b8fcd>{sgl_unmap_user_pages+93}
RSP: 0018:ffff810035725e18 EFLAGS: 00010256
RAX: 0000000000000000 RBX: 0000000000000007 RCX: 000000000000000f
RDX: 00000000000000e0 RSI: 0000000000000001 RDI: ffff81000260b6f8
RBP: ffff810004852068 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000008000 R11: 0000000000000200 R12: 0000000000000008
R13: 0000000000000000 R14: 0000000000008000 R15: ffff810004949d10
FS: 00002aaaab53d880(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaaac0000 CR3: 0000000035691000 CR4: 00000000000006e0
Process taper (pid: 2418, threadinfo ffff810035724000, task ffff81017d680300)
Stack: ffff8101423f3600 ffff810004852000 0000000000000040 0000000000008000
ffff810004949c00 ffffffff802b48fb ffff810004852000 ffffffff802b4fb1
ffff810000000000 ffffffff00000001
Call Trace:<ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
<ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
<ffffffff8010db7a>{system_call+126}

Code: 0f 0b 68 ba 12 3a 80 c2 55 01 f0 83 47 08 ff 0f 98 c0 84 c0
RIP <ffffffff802b8fcd>{sgl_unmap_user_pages+93} RSP <ffff810035725e18>
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/rmap.c:487
invalid operand: 0000 [2] SMP
CPU 1
Modules linked in: bonding
Pid: 2418, comm: taper Tainted: G B 2.6.14.2 #1
RIP: 0010:[<ffffffff8016f3f7>] <ffffffff8016f3f7>{page_remove_rmap+39}
RSP: 0018:ffff810035725ab0 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff8100356976f8 RCX: ffff81000000f000
RDX: 0000000000000000 RSI: 8000000064c69067 RDI: ffff81000260b6f8
RBP: 00002aaaaaadf000 R08: 0000000000000000 R09: ffff81000260b688
R10: 00000000fffffffa R11: 0000000000000000 R12: ffff810101c22380
R13: 8000000064c69067 R14: ffff81000260b6f8 R15: 0000000000000000
FS: 00002aaaab53d880(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaaac0000 CR3: 0000000035691000 CR4: 00000000000006e0
Process taper (pid: 2418, threadinfo ffff810035724000, task ffff81017d680300)
Stack: ffffffff80166ecd 00002aaaaab62000 ffff810035696aa8 00002aaaaab62000
00002aaaaab62000 00002aaaaab61fff ffff810035695550 00002aaaaab62000
ffffffff80167180 ffff810035725d68
Call Trace:<ffffffff80166ecd>{zap_pte_range+477} <ffffffff80167180>{unmap_page_range+496}
<ffffffff801672e5>{unmap_vmas+293} <ffffffff8016cfa2>{exit_mmap+162}
<ffffffff80131ce1>{mmput+49} <ffffffff801371c6>{do_exit+438}
<ffffffff8010f6f1>{die+81} <ffffffff8010f9df>{do_invalid_op+159}
<ffffffff802b8fcd>{sgl_unmap_user_pages+93} <ffffffff80381f76>{thread_return+86}
<ffffffff802a8662>{sym_setup_data_and_start+402} <ffffffff8010e84d>{error_exit+0}
<ffffffff802b8fcd>{sgl_unmap_user_pages+93} <ffffffff802b8fe8>{sgl_unmap_user_pages+120}
<ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
<ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
<ffffffff8010db7a>{system_call+126}

Code: 0f 0b 68 9b 35 3a 80 c2 e7 01 48 c7 c6 ff ff ff ff bf 20 00
RIP <ffffffff8016f3f7>{page_remove_rmap+39} RSP <ffff810035725ab0>
<1>Fixing recursive fault but reboot is needed!
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
<ffffffff801b9c7b>{ext3_prepare_write+27}
PGD 355bc067 PUD 355c9067 PMD 0
Oops: 0000 [3] SMP
CPU 0
Modules linked in: bonding
Pid: 2416, comm: driver Tainted: G B 2.6.14.2 #1
RIP: 0010:[<ffffffff801b9c7b>] <ffffffff801b9c7b>{ext3_prepare_write+27}
RSP: 0018:ffff8100355e7b48 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffffff8040f660 RCX: 000000000000017d
RDX: 0000000000000094 RSI: ffff81000260b6f8 RDI: ffff810035b09cc0
RBP: 000000000000000e R08: 00000000fffffffa R09: 00000000000000e9
R10: ffff81001190c818 R11: 0000000000000000 R12: ffff81000260b6f8
R13: ffff81000260b6f8 R14: 000000000000017d R15: 0000000000000094
FS: 00002aaaab53d8e0(0000) GS:ffffffff804db800(0000) knlGS:00000000555bc920
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000035555000 CR4: 00000000000006e0
Process driver (pid: 2416, threadinfo ffff8100355e6000, task ffff8100f43e8a80)
Stack: ffff81014e643310 ffffffff8040f660 000000000000000e ffff81000260b6f8
ffff81005c0fc310 0000000000000094 00000000000000e9 ffffffff80158247
0000000000000292 00002aaaaaac0000
Call Trace:<ffffffff80158247>{generic_file_buffered_write+551}
<ffffffff801c06bd>{__ext3_journal_stop+45} <ffffffff8019ec74>{__mark_inode_dirty+52}
<ffffffff801963ec>{inode_update_time+188} <ffffffff80158a08>{__generic_file_aio_write_nolock+936}
<ffffffff80381f76>{thread_return+86} <ffffffff8013d349>{lock_timer_base+41}
<ffffffff80158cfe>{generic_file_aio_write+110} <ffffffff801b7783>{ext3_file_write+35}
<ffffffff8017ae43>{do_sync_write+211} <ffffffff8018ecc0>{__pollwait+0}
<ffffffff8014a2b0>{autoremove_wake_function+0} <ffffffff8018f681>{sys_select+1153}
<ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
<ffffffff8010db7a>{system_call+126}

Code: 48 8b 28 48 89 ef e8 aa 26 00 00 c7 44 24 04 00 00 00 00 89
RIP <ffffffff801b9c7b>{ext3_prepare_write+27} RSP <ffff8100355e7b48>
CR2: 0000000000000000
<0>Bad page state at prep_new_page (in process 'dumper', page ffff81000260b6f8)
flags:0x010000000000001d mapping:0000000000000000 mapcount:-1 count:1
Backtrace:

Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a371>{prep_new_page+65}
<ffffffff8015ab2e>{buffered_rmqueue+302} <ffffffff8015ad85>{__alloc_pages+261}
<ffffffff801581bd>{generic_file_buffered_write+413}
<ffffffff80139509>{current_fs_time+105} <ffffffff8019636e>{inode_update_time+62}
<ffffffff80158a08>{__generic_file_aio_write_nolock+936}
<ffffffff8031f4a4>{sock_common_recvmsg+52} <ffffffff8031bb30>{sock_aio_read+272}
<ffffffff80158cfe>{generic_file_aio_write+110} <ffffffff801b7783>{ext3_file_write+35}
<ffffffff8017ae43>{do_sync_write+211} <ffffffff8018ecc0>{__pollwait+0}
<ffffffff8014a2b0>{autoremove_wake_function+0} <ffffffff8018f681>{sys_select+1153}
<ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
<ffffffff8010db7a>{system_call+126}
Trying to fix it up, but a reboot is needed
Bad page state at prep_new_page (in process 'find', page ffff81000260b6f8)
flags:0x0100000000000064 mapping:ffff8100f3be9be9 mapcount:1 count:1
Backtrace:

Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a371>{prep_new_page+65}
<ffffffff8015ab2e>{buffered_rmqueue+302} <ffffffff8015ad85>{__alloc_pages+261}
<ffffffff8015e7a3>{kmem_getpages+99} <ffffffff8015fbb0>{cache_grow+192}
<ffffffff8015fe3b>{cache_alloc_refill+459} <ffffffff80160226>{kmem_cache_alloc+54}
<ffffffff80193831>{d_alloc+33} <ffffffff80188fe9>{real_lookup+105}
<ffffffff801893c0>{do_lookup+112} <ffffffff80189e07>{__link_path_walk+2551}
<ffffffff8018a382>{link_path_walk+178} <ffffffff8018a8ce>{path_lookup+446}
<ffffffff8018aa9e>{__user_walk+62} <ffffffff801849b6>{vfs_lstat+38}
<ffffffff80184dff>{sys_newlstat+31} <ffffffff8010db7a>{system_call+126}

Trying to fix it up, but a reboot is needed
Unable to handle kernel paging request at 00002aaaab9c5b61 RIP:
<ffffffff8015fdba>{cache_alloc_refill+330}
PGD c2512067 PUD c2513067 PMD 0
Oops: 0002 [4] SMP
CPU 0
Modules linked in: bonding
Pid: 3011, comm: find Tainted: G B 2.6.14.2 #1
RIP: 0010:[<ffffffff8015fdba>] <ffffffff8015fdba>{cache_alloc_refill+330}
RSP: 0018:ffff810112f05c28 EFLAGS: 00010082
RAX: 00002aaaab9c5b59 RBX: 0000000000000010 RCX: 0000000000029ba6
RDX: 00002aaaab9c5bb3 RSI: ffff810064c69040 RDI: ffff81000c01a288
RBP: ffff8100f6fc4800 R08: ffff81000c01a250 R09: ffff81000c01a260
R10: 0000000000000000 R11: 0000000000000000 R12: ffff81000c01a240
R13: ffff8100f6fc3640 R14: ffff81000c01a288 R15: 00000000000000d0
FS: 00002aaaaae00640(0000) GS:ffffffff804db800(0000) knlGS:00000000555bc920
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaab9c5b61 CR3: 00000000c2bb3000 CR4: 00000000000006e0
Process find (pid: 3011, threadinfo ffff810112f04000, task ffff810102b46040)
Stack: ffff810112f05e68 ffff810179923cb8 fffffffffffffff4 ffff810112f05d28
ffff810179923cb8 ffff810112f05d28 ffff810112f05e68 ffffffff80160226
0000000000000292 ffffffff80193831
Call Trace:<ffffffff80160226>{kmem_cache_alloc+54} <ffffffff80193831>{d_alloc+33}
<ffffffff80188fe9>{real_lookup+105} <ffffffff801893c0>{do_lookup+112}
<ffffffff80189e07>{__link_path_walk+2551} <ffffffff8018a382>{link_path_walk+178}
<ffffffff8018a8ce>{path_lookup+446} <ffffffff8018aa9e>{__user_walk+62}
<ffffffff801849b6>{vfs_lstat+38} <ffffffff80184dff>{sys_newlstat+31}
<ffffffff8010db7a>{system_call+126}

Code: 48 89 50 08 48 89 02 48 c7 46 08 00 02 20 00 83 7e 24 ff 48
RIP <ffffffff8015fdba>{cache_alloc_refill+330} RSP <ffff810112f05c28>
CR2: 00002aaaab9c5b61
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: bonding
Pid: 7, comm: events/1 Tainted: G B 2.6.14.2 #1
RIP: 0010:[<ffffffff803837dd>] <ffffffff803837dd>{.text.lock.spinlock+118}
RSP: 0018:ffff810004869dd0 EFLAGS: 00000086
RAX: ffff81000c01a240 RBX: ffff81000c01a288 RCX: ffff8100f6fc3640
RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff81000c01a288
RBP: ffff810100009dc0 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000066 R12: 0000000000000000
R13: ffff810100009dd0 R14: 0000000000000292 R15: ffff810100009e40
FS: 00002aaaaae00640(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaaf1df40 CR3: 000000017f448000 CR4: 00000000000006e0
Process events/1 (pid: 7, threadinfo ffff810004868000, task ffff8100f6fb6080)
Stack: ffffffff8015e35b ffff8100f6fc3640 ffff810100009f60 0000000000000001
ffff810100009e40 ffff8100f6fc3640 ffff8100f6fc38e0 ffff810100009f88
ffffffff80161414 ffff810004869e58
Call Trace:<ffffffff8015e35b>{drain_alien_cache+123} <ffffffff80161414>{cache_reap+164}
<ffffffff80161370>{cache_reap+0} <ffffffff8014553c>{worker_thread+476}
<ffffffff8012ed70>{default_wake_function+0} <ffffffff8012ed70>{default_wake_function+0}
<ffffffff80145360>{worker_thread+0} <ffffffff80149c82>{kthread+146}
<ffffffff8010ea02>{child_rip+8} <ffffffff80145360>{worker_thread+0}
<ffffffff80149bf0>{kthread+0} <ffffffff8010e9fa>{child_rip+0}


Code: 80 3f 00 7e f9 e9 59 fe ff ff e8 58 41 e9 ff e9 6f fe ff ff
console shuts up ...
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/