Re: tracing kernel panics

From: Eric Dumazet
Date: Sun Jun 19 2011 - 04:01:07 EST


Le samedi 18 juin 2011 Ã 19:12 -0500, Shane a Ãcrit :
> Anyone offer advice on how I should go about tracking down this kernel
> panic? Apologies if I've got the wrong list, let me know.
>
> I'm developing a networking module, my own protocol over TCP. I load
> my module, I make a network connection, and then I close it. I do
> nothing more, then after about 1-2 minutes, it throws this panic
> output and none of it seems to come from my module code. I know my
> code is the problem, and if I don't run my modules, I never get
> panics. But nothing in the stack trace I recognize from my code and
> I'm having a very hard time find where in my code I've gone wrong. I
> run my kernel modules in a VM as a guest OS which connects to another
> guest OS (currently using 2.6.36-r5).
>
> I've read how to analyze an OOPS, but ... here I can't even find the
> file that this might belong to so as to disassemble it, or understand
> what device this is. Any suggestions/pointers/advice much appreciated?

General kernel programming advices ...

It seems you have bugs in your module, maybe something like freeing
memory twice, or memory you dont own, or manipulating some data without
taking the associated lock protecting it.

tcp sockets are protected by RCU and various locks, getting them used
right is not an easy task.

You should take a look at various options in "Kernel hacking"
(some of them cannot be used together)

1) Using SLUB debug and "slub_debug=FZPU"
2) CONFIG_DEBUG_KMEMLEAK
3) CONFIG_PROVE_LOCKING
4) CONFIG_PROVE_RCU
5) debug linked list manipulations

and so on.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/