Re: [BUG] wireless : cpu stuck for 61s

From: Jiri Slaby
Date: Mon Aug 04 2008 - 05:22:29 EST


Dave Young napsal(a):
On Thu, Jul 31, 2008 at 5:15 PM, Pekka J Enberg <penberg@xxxxxxxxxxxxxx> wrote:
On Wed, 30 Jul 2008, Andrew Morton wrote:
INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398
So the corrupted object was free'd by skb_release_data() so we need to
look for a driver or the networking stack calling that function too early.

INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030

Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÃ..ZZZZZZZZ
The object starts here (the poison values for first 32 bytes are okay):

Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
And here are the first 96 bytes of the corruption:

Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÃÃÃÃÃÃ..{.F@
Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@xxx!.z!...
Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÃ...{..........
But I think that's just the payload of a SKB?

It's a receive frame from ath5k, I suppose. 00:17:7b:00:46:40 is your AP?

Redzone 0xf6f3b060: bb bb bb bb ÂÂÂÂ
The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces
use-after-free.

Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2
[<c0180f5d>] print_trailer+0xad/0xf0
[<c018103b>] check_bytes_and_report+0x9b/0xc0
[<c018145e>] check_object+0x19e/0x1e0
[<c01821a4>] __slab_alloc+0x454/0x4f0
[<c01834d6>] __kmalloc_track_caller+0xe6/0xf0
[<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
[<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
[<c03dce79>] __alloc_skb+0x49/0x100
[<c03dd1ec>] dev_alloc_skb+0x1c/0x30
[<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]
[<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
[<c014969a>] ? print_lock_contention_bug+0x1a/0xe0
[<c012eafc>] tasklet_action+0x4c/0xc0
[...]
=======================
FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b

Dave, could you please remind us which net driver was in use here?
There's ath5k in the stack trace but that, of course, doesn't
automatically mean it's at fault here. It could have been just the poor
bastard who was the next to allocate 4 KB with kmalloc() noticing the
corruption.

No, unfortunately ath5k *is* likely the culprit. Next time please Cc ath5k-devel@xxxxxxxxxxxxxxx even if it is only a suspicion.

But I still have no idea with the poison overwritten.

Could you try patch from
http://lkml.org/lkml/2008/7/15/276
? (I have no idea how reproducible is this by you, it often happens on noisy channels and/or by lowering RX buffers, i.e. ATH_RXBUF).

[It hit mainline few days ago, I'm going to fwd it to stable.]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/