Re: [BUG] 2.6.3 Slab corruption: errors are triggered when memoryexceeds 2.5GB (correction)

From: Manfred Spraul
Date: Wed Feb 25 2004 - 01:24:21 EST


Darren Williams wrote:

Hi Manfred

I have updated to the latest bk and new output can be found at:
http://quasar.cse.unsw.edu.au/~dsw/public-files/lemon-debug/
kern-log-bk

Also I am quite confident that it is not a hardware problem.

I took a look at alloc_skb(..) and there is a reference to
an atomic_t token with this being the most suspect

150> atomic_set(&(skb_shinfo(skb)->dataref), 1);


I don't think so:
The allocation that generates the error is skb->head: The cache name is "size-2048", thus the allocation is a kmalloc(1000-2000, probably 1536 for one eth frame). The skb itself is allocated from the skbuff_head_cache.

I don't see a pattern in the virtual addresses:
start=e000000101ee09a0, len=2048
start=e000000101ee09a0, len=2048
start=e000000101ee11b8, len=2048
start=e000000101ee19d0, len=2048
start=e000000101ee3218, len=2048
start=e00000017eed1b90, len=2048
start=e00000017eed23a8, len=2048
start=e00000017eed2bc0, len=2048
start=e00000017eed4308, len=2048
start=e00000017eed5338, len=2048
start=e00000017eed5338, len=2048
start=e00000017eed5b50, len=2048
start=e00000017eed5b50, len=2048
start=e00000017eed6b80, len=2048
start=e00000017eed82c8, len=2048
start=e00000017eedc288, len=2048
start=e00000017eedcaa0, len=2048
start=e00000017eeddad0, len=2048
start=e00000017eede2e8, len=2048
start=e00000017eedeb00, len=2048
start=e00000017ef60a60, len=2048
start=e00000017ef61a90, len=2048
start=e00000017ef622a8, len=2048
start=e00000017ef62ac0, len=2048
start=e00000017ef632d8, len=2048
start=e00000017ef65a50, len=2048
start=e00000017ef65a50, len=2048
start=e00000017ef65a50, len=2048
start=e00000017ef66a80, len=2048

That virtually rules out a bad memory chip.

But the corrupted byte is always at offset 0x620 into the allocation:
Slab corruption: start=e00000017ef65a50, len=2048
620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
--
Slab corruption: start=e000000101ee19d0, len=2048
620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
--
Slab corruption: start=e000000101ee3218, len=2048
620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
--
Slab corruption: start=e00000017ef66a80, len=2048
620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
--
Slab corruption: start=e000000101ee11b8, len=2048
620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

0x620 (1568) is behind the end of the actual eth frame. Who could modify that?

Darren, which nic do you use? Could you try what happens if you reduce the MTU?

--
Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/