Re: 2.6.24-rc6-mm1

From: Torsten Kaiser
Date: Tue Jan 01 2008 - 07:59:27 EST


On Jan 1, 2008 1:04 PM, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Dec 31, 2007 at 09:15:19PM +0100, Torsten Kaiser wrote:
> >
> > I then tried to "fix" it with this suspect.
> > I changed "skb_release_all(dst);" back to "skb_release_data(dst);" in
> > skb_morph() (net/core/skbuff.c).
>
> Check /proc/net/snmp to see if you're getting any fragments, if not
> then skb_morph shouldn't even be getting called.

OK, thanks for that hint.
I look at this after my next tests.

> > I'm now at 205 of 210 packages completed without a further hang. I
> > also do not see an obvious memory leak.
>
> In any case, I suspect the cause of your problem is that somebody
> somewhere is doing a double-free on an skb.

Is there any debug option I could turn on to catch this?

Hmm... __alloc_skb() uses kmem_cache_alloc_node() and I did run
-rc3-mm2 a long time with slub_debug=FZP and that did not catch
anything. Shouldn't the poisoning catch that? (Sorry if this question
is stupid, but while I can read C, I'm not a kernel expert)

> Since you're the only person who can reproduce this, we really need
> your help to track this down. Since bisecting the mm tree is not
> practical, you could start by checking whether the bug is in mm only
> or whether it affects rc6 too.

I will try -rc6-mm1 and vanilla -rc6 and report back.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/