Re: [PATCH] kgdb: Flush console before entering kgdb on panic

From: Daniel Thompson
Date: Mon Oct 16 2023 - 10:00:28 EST

Next message: Michael Roth: "[PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries"
Previous message: Hugo Villeneuve: "Re: [PATCH 2/2] dt-bindings: serial: sc16is7xx: remove 'clock-frequency' property"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Sep 20, 2023 at 02:01:30PM +0200, Petr Mladek wrote:
> On Wed 2023-08-30 10:53:59, Daniel Thompson wrote:
> > On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> > > <daniel.thompson@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > > > when using the buddy lockup detector on arm64 and running:
> > > > > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > > > >
> > > > > I could see:
> > > > > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > > > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > > > [ 32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > > > [ 32.557359] NMI backtrace for cpu 6
> > > > > ... [backtrace for cpu 6] ...
> > > > > [ 32.558353] NMI backtrace for cpu 5
> > > > > ... [backtrace for cpu 5] ...
> > > > > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > > > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > > > >
> > > > > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > > > [5]kdb>
> > > > >
> > > > > As you can see, backtraces for the other CPUs start printing and get
> > > > > interleaved with the kdb PANIC print.
> > > > >
> > > > > Let's replicate the commands to flush the console in the kdb panic
> > > > > entry point to avoid this.
> > > > >
> > > > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > > ---
> > > > >
> > > > > kernel/debug/debug_core.c | 3 +++
> > > > > 1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > > > --- a/kernel/debug/debug_core.c
> > > > > +++ b/kernel/debug/debug_core.c
> > > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > > > if (panic_timeout)
> > > > > return;
> > > > >
> > > > > + debug_locks_off();
> > > > > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > > > +
> > > > > if (dbg_kdb_mode)
> > > > > kdb_printf("PANIC: %s\n", msg);
> > > >
> > > > I'm somewhat included to say *this* (calling kdb_printf() when not
> > > > actually in the debugger) is the cause of the problem. kdb_printf()
> > > > does some pretty horid things to the console and isn't intended to
> > > > run while the system is active.
> > > >
> > > > I'd therefore be more tempted to defer the print to the b.p. trap
> > > > handler itself and make this part of kgdb_panic() look more like:
> > > >
> > > > kgdb_panic_msg = msg;
> > > > kgdb_breakpoint();
> > > > kgdb_panic_msg = NULL;
> > >
> > > Unfortunately I think that only solves half the problem. As a quick
> > > test, I tried simply commenting out the "kdb_printf" line in
> > > kgdb_panic(). While that avoids the interleaved panic message and
> > > backtrace, it does nothing to actually get the backtraces printed out
> > > before you end up in kdb. As an example, this is what happened when I
> > > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> > > had the "kdb_printf" in kgdb_panic() commented out:
> > >
> > > [ 72.658424] lkdtm: Performing direct entry HARDLOCKUP
> > > [ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > ...
> > > [ 82.234801] Sending NMI from CPU 5 to CPUs 6:
> > > [ 82.239296] NMI backtrace for cpu 6
> > > ... [ stack trace for CPU 6 ] ...
> > > [ 82.240294] NMI backtrace for cpu 5
> > > ... [ stack trace for CPU 5 ] ...
> > > [ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > [ 82.581291] NMI backtrace
> > > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> > > to Keyboard Entry
> > > [5]kdb>
> > >
> > > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> > > show up if I use the "dmesg" command but it's a bit of a hassle to run
> > > "dmesg" to look for any extra debug messages every time I drop in kdb.
> > >
> > > I guess perhaps that part isn't obvious from the commit message?
> >
> > I figured it was a risk.
> >
> > In fact it's an area where my instinct to honour console messages and my
> > instinct to get into the kernel as soon as possible after the decision
> > to invoke it has been made come into conflict.
> >
> > In other words does it matter that the console buffers are not flushed
> > before entering kgdb? However having thought about it for a little while
> > (and knowing the console code tends to be written to be decently robust)
> > I can come to the view the flushing is best.
>
> Just for record. I wondered if there was bug in printk() that the
> backtraces were not flushed. And I think that it is because they
> are printed in NMI context where the conosle handling is deferred.
>
> From this POV, the explicit flush makes perfect sense.

Agreed... and (with apologies for the length of time to get this
done):
Applied!

Thanks

Daniel.

Next message: Michael Roth: "[PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries"
Previous message: Hugo Villeneuve: "Re: [PATCH 2/2] dt-bindings: serial: sc16is7xx: remove 'clock-frequency' property"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]