Re: [PATCH] kgdb: Don't use a notifier to enter kgdb at panic; call directly

From: Daniel Thompson
Date: Tue Jul 09 2019 - 10:59:51 EST


On Wed, Jul 03, 2019 at 10:03:54AM -0700, Douglas Anderson wrote:
> Right now kgdb/kdb hooks up to debug panics by registering for the
> panic notifier. This works OK except that it means that kgdb/kdb gets
> called _after_ the CPUs in the system are taken offline. That means
> that if anything important was happening on those CPUs (like something
> that might have contributed to the panic) you can't debug them.
>
> Specifically I ran into a case where I got a panic because a task was
> "blocked for more than 120 seconds" which was detected on CPU 2. I
> nicely got shown stack traces in the kernel log for all CPUs including
> CPU 0, which was running 'PID: 111 Comm: kworker/0:1H' and was in the
> middle of __mmc_switch().
>
> I then ended up at the kdb prompt where switched over to kgdb to try
> to look at local variables of the process on CPU 0. I found that I
> couldn't. Digging more, I found that I had no info on any tasks
> running on CPUs other than CPU 2 and that asking kdb for help showed
> me "Error: no saved data for this cpu". This was because all the CPUs
> were offline.
>
> Let's move the entry of kdb/kgdb to a direct call from panic() and
> stop using the generic notifier. Putting a direct call in allows us
> to order things more properly and it also doesn't seem like we're
> breaking any abstractions by calling into the debugger from the panic
> function.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

This patch changes the way kdump and kgdb interact with each other.
However it would seem rather odd to have both tools simultaneously
armed and, even if they were, the user still has the option to
use panic_timeout to force a kdump to happen. Thus I think the
change of order is acceptable:

Reviewed-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>


Daniel.


> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4d9f55bf7d38..e2971168b059 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -12,6 +12,7 @@
> #include <linux/debug_locks.h>
> #include <linux/sched/debug.h>
> #include <linux/interrupt.h>
> +#include <linux/kgdb.h>
> #include <linux/kmsg_dump.h>
> #include <linux/kallsyms.h>
> #include <linux/notifier.h>
> @@ -219,6 +220,13 @@ void panic(const char *fmt, ...)
> dump_stack();
> #endif
>
> + /*
> + * If kgdb is enabled, give it a chance to run before we stop all
> + * the other CPUs or else we won't be able to debug processes left
> + * running on them.
> + */
> + kgdb_panic(buf);
> +
> /*
> * If we have crashed and we have a crash kernel loaded let it handle
> * everything else.
> --
> 2.22.0.410.gd8fdbe21b5-goog
>