Re: [BUG] wireless : cpu stuck for 61s

From: Dave Young
Date: Wed Jul 30 2008 - 22:06:54 EST


On Wed, Jul 30, 2008 at 6:10 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, 30 Jul 2008 18:02:03 +0800 "Dave Young" <hidave.darkstar@xxxxxxxxx> wrote:
>
>> On Wed, Jul 30, 2008 at 5:08 PM, Andrew Morton
>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>> > On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote:
>> >
>> >> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote:
>> >> > While set my wlan0 to ad-hoc mode and then set the essid to something,
>> >> > this bug triggered.
>> >> >
>> >> > I suspect it's related to debugfs or mutex? But I'm not sure.
>> >> >
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3)
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210
>> >>
>> >> It looks like it's getting stuck on
>> >> mutex_lock(&parent->d_inode->i_mutex);
>> >>
>> >> inside debugfs_create_by_name, but I have no idea why that would happen.
>> >>
>> >
>> > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we
>> > can find another task which is holding onto that lock.
>> >
>> >
>>
>> Andrew, the keyboard doesn't response after the bug triggered.
>>
>> So I just keep sysrq-W / sysrq-T every several seconds after I set the
>> wlan0 interface.
>> Attached please see the long log file.
>
> argh. Look:
>
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] f703fbf8 00000082 00000002 f703fbe4 f703fbd8 00000000 f704ac40 f703fbcc
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000046 c0669060 c066c580 c066c580 c066c580 f703fbe8 f704ad94 c1eaf580
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000001 f704ac40 00000000 f702ca00 c0452697 ffff55ba 00000000 c014b06d
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace:
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0452697>] ? _spin_unlock_irqrestore+0x47/0x60
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01942c0>] ? __pollwait+0x60/0xd0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888b02e>] ? evdev_poll+0x2e/0x60 [evdev]
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0193564>] do_sys_poll+0x264/0x3a0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0194260>] ? __pollwait+0x0/0xd0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0122160>] ? default_wake_function+0x0/0x10
> Jul 30 17:34:22 darkstar last message repeated 3 times
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01220c5>] ? try_to_wake_up+0x75/0x110
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014aed0>] ? mark_held_locks+0x40/0x80
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c03da639>] ? sock_def_readable+0x59/0x80
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040
> Jul 30 17:34:22 darkstar last message repeated 5 times
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0109d0f>] ? native_sched_clock+0x7f/0xb0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0243b4f>] ? security_file_permission+0xf/0x20
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186387>] ? rw_verify_area+0x57/0xc0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0451f5d>] ? _spin_unlock+0x1d/0x20
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186b94>] ? vfs_read+0x94/0x130
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888be60>] ? evdev_read+0x0/0x1b0 [evdev]
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103dbf>] ? restore_nocheck+0x12/0x15
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01936cd>] sys_poll+0x2d/0x70
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103d5e>] syscall_call+0x7/0xb
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] =======================
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] f63efd94 00000092 00000002 00000000 00000000 f5064028 f7049d80 00000001
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 c0669060 c066c580 c066c580 c066c580 00000000 f7049ed4 c1e5d580
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 00000000 00000135 f702e080 f7049d80 00000003 00000000 f7049d80
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace:
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0
> Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0436b98>] unix_stream_recvmsg+0x2a8/0x4e0
>
>
> We're missing part of the sysrq traces. I noticed this a while back.
> We need this info to find the D-state processes.
>
> It's _possible_ that the kernel is actually trying to print this info,
> but it's doing it at a different prink facility level from the stack
> output, so we're only seeing part of the printing.
>
> Please try changing your logging level (dmesg -n 0, or dmesg -n 7 - I
> always forget which). Then retry the sysrq-T. If that successfully
> causes _all_ the output to be presented, then please rerun the above
> test.
>
> You might want to compress the file too - I don't know whether the vger
> servers will accept a 730kB mail.
>
>

Ok here it is.
BTW, I run "klogd -c 7" after boot

This time I get a kmalloc poison overwritten:

Jul 31 09:51:11 darkstar kernel: [ 2024.960036] BUG kmalloc-4096:
Poison overwritten
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]
-----------------------------------------------------------------------------
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO:
0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Allocated in
dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Freed in
skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Slab 0xc1c05440
objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Object
0xf6f3a060 @offset=8288 fp=0xf6f39030
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Bytes b4 0xf6f3a050:
5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÃ..ZZZZZZZZ
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a060:
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a070:
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a080:
80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÃÃÃÃÃÃ..{.F@
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a090:
00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@xxx!.z!...
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0a0:
64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0b0:
84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0c0:
07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0d0:
6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÃ...{..........
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Redzone 0xf6f3b060:
bb bb bb bb ÂÂÂÂ
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Padding 0xf6f3b088:
5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Pid: 0, comm: swapper
Tainted: G W 2.6.26-smp #2
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0180f5d>]
print_trailer+0xad/0xf0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c018103b>]
check_bytes_and_report+0x9b/0xc0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c018145e>]
check_object+0x19e/0x1e0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01821a4>]
__slab_alloc+0x454/0x4f0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01834d6>]
__kmalloc_track_caller+0xe6/0xf0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] ?
dev_alloc_skb+0x1c/0x30
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] ?
dev_alloc_skb+0x1c/0x30
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dce79>]
__alloc_skb+0x49/0x100
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>]
dev_alloc_skb+0x1c/0x30
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<f8a58599>]
ath5k_rxbuf_setup+0x39/0x200 [ath5k]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<f8a5a697>]
ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c014969a>] ?
print_lock_contention_bug+0x1a/0xe0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012eafc>]
tasklet_action+0x4c/0xc0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012e463>]
__do_softirq+0x93/0x120
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012e547>]
do_softirq+0x57/0x60
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012ea29>] irq_exit+0x69/0x80
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0106b55>] do_IRQ+0x45/0x80
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a5d0>] ?
mwait_idle+0x0/0x50
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0104752>]
common_interrupt+0x2e/0x34
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a5d0>] ?
mwait_idle+0x0/0x50
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a609>] ?
mwait_idle+0x39/0x50
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01026e0>] cpu_idle+0x60/0xd0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c043c8ce>]
rest_init+0x4e/0x60
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] =======================


The syslog detail please see the attachment.

--
Regards
dave

Attachment: dmesg.gz
Description: GNU Zip compressed data