Re: [PATCH] kgdb: Flush console before entering kgdb on panic

From: Daniel Thompson
Date: Wed Aug 30 2023 - 15:08:29 EST


On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> Hi,
>
> On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> <daniel.thompson@xxxxxxxxxx> wrote:
> >
> > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > when using the buddy lockup detector on arm64 and running:
> > > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > >
> > > I could see:
> > > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > [ 32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > [ 32.557359] NMI backtrace for cpu 6
> > > ... [backtrace for cpu 6] ...
> > > [ 32.558353] NMI backtrace for cpu 5
> > > ... [backtrace for cpu 5] ...
> > > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > >
> > > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > [5]kdb>
> > >
> > > As you can see, backtraces for the other CPUs start printing and get
> > > interleaved with the kdb PANIC print.
> > >
> > > Let's replicate the commands to flush the console in the kdb panic
> > > entry point to avoid this.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > ---
> > >
> > > kernel/debug/debug_core.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > --- a/kernel/debug/debug_core.c
> > > +++ b/kernel/debug/debug_core.c
> > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > if (panic_timeout)
> > > return;
> > >
> > > + debug_locks_off();
> > > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > +
> > > if (dbg_kdb_mode)
> > > kdb_printf("PANIC: %s\n", msg);
> >
> > I'm somewhat included to say *this* (calling kdb_printf() when not
> > actually in the debugger) is the cause of the problem. kdb_printf()
> > does some pretty horid things to the console and isn't intended to
> > run while the system is active.
> >
> > I'd therefore be more tempted to defer the print to the b.p. trap
> > handler itself and make this part of kgdb_panic() look more like:
> >
> > kgdb_panic_msg = msg;
> > kgdb_breakpoint();
> > kgdb_panic_msg = NULL;
>
> Unfortunately I think that only solves half the problem. As a quick
> test, I tried simply commenting out the "kdb_printf" line in
> kgdb_panic(). While that avoids the interleaved panic message and
> backtrace, it does nothing to actually get the backtraces printed out
> before you end up in kdb. As an example, this is what happened when I
> used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> had the "kdb_printf" in kgdb_panic() commented out:
>
> [ 72.658424] lkdtm: Performing direct entry HARDLOCKUP
> [ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> ...
> [ 82.234801] Sending NMI from CPU 5 to CPUs 6:
> [ 82.239296] NMI backtrace for cpu 6
> ... [ stack trace for CPU 6 ] ...
> [ 82.240294] NMI backtrace for cpu 5
> ... [ stack trace for CPU 5 ] ...
> [ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> [ 82.581291] NMI backtrace
> Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> to Keyboard Entry
> [5]kdb>
>
> As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> show up if I use the "dmesg" command but it's a bit of a hassle to run
> "dmesg" to look for any extra debug messages every time I drop in kdb.
>
> I guess perhaps that part isn't obvious from the commit message?

I figured it was a risk.

In fact it's an area where my instinct to honour console messages and my
instinct to get into the kernel as soon as possible after the decision
to invoke it has been made come into conflict.

In other words does it matter that the console buffers are not flushed
before entering kgdb? However having thought about it for a little while
(and knowing the console code tends to be written to be decently robust)
I can come to the view the flushing is best.


> Should I send a new version with an updated commit message indicating
> that it's not just the jumbled text that's a problem but also the lack
> of stack traces?

No real need.

I don't really like seeing kdb_printf() being called from here but
having reviewed a bit of console code I think we can might be able
to use the new infrastructure to make kdb_printf() a slightly less
hateful ;-).


Daniel.