Re: [PATCH] kgdb: Flush console before entering kgdb on panic

From: Doug Anderson
Date: Fri Aug 25 2023 - 10:20:16 EST


Hi,

On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
>
> On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > console isn't flushed before the `kdb` prompt came up. Specifically,
> > when using the buddy lockup detector on arm64 and running:
> > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> >
> > I could see:
> > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > [ 32.552865] Sending NMI from CPU 5 to CPUs 6:
> > [ 32.557359] NMI backtrace for cpu 6
> > ... [backtrace for cpu 6] ...
> > [ 32.558353] NMI backtrace for cpu 5
> > ... [backtrace for cpu 5] ...
> > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> >
> > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > [5]kdb>
> >
> > As you can see, backtraces for the other CPUs start printing and get
> > interleaved with the kdb PANIC print.
> >
> > Let's replicate the commands to flush the console in the kdb panic
> > entry point to avoid this.
> >
> > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > ---
> >
> > kernel/debug/debug_core.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index d5e9ccde3ab8..3a904d8697c8 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > if (panic_timeout)
> > return;
> >
> > + debug_locks_off();
> > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > +
> > if (dbg_kdb_mode)
> > kdb_printf("PANIC: %s\n", msg);
>
> I'm somewhat included to say *this* (calling kdb_printf() when not
> actually in the debugger) is the cause of the problem. kdb_printf()
> does some pretty horid things to the console and isn't intended to
> run while the system is active.
>
> I'd therefore be more tempted to defer the print to the b.p. trap
> handler itself and make this part of kgdb_panic() look more like:
>
> kgdb_panic_msg = msg;
> kgdb_breakpoint();
> kgdb_panic_msg = NULL;

Unfortunately I think that only solves half the problem. As a quick
test, I tried simply commenting out the "kdb_printf" line in
kgdb_panic(). While that avoids the interleaved panic message and
backtrace, it does nothing to actually get the backtraces printed out
before you end up in kdb. As an example, this is what happened when I
used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
had the "kdb_printf" in kgdb_panic() commented out:

[ 72.658424] lkdtm: Performing direct entry HARDLOCKUP
[ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
...
[ 82.234801] Sending NMI from CPU 5 to CPUs 6:
[ 82.239296] NMI backtrace for cpu 6
... [ stack trace for CPU 6 ] ...
[ 82.240294] NMI backtrace for cpu 5
... [ stack trace for CPU 5 ] ...
[ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
[ 82.581291] NMI backtrace
Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
to Keyboard Entry
[5]kdb>

As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
show up if I use the "dmesg" command but it's a bit of a hassle to run
"dmesg" to look for any extra debug messages every time I drop in kdb.

I guess perhaps that part isn't obvious from the commit message?
Should I send a new version with an updated commit message indicating
that it's not just the jumbled text that's a problem but also the lack
of stack traces?

Thanks!

-Doug