Re: [PATCH printk 10/18] kgbd: Pretend that console list walk is safe

From: Doug Anderson
Date: Wed Sep 28 2022 - 19:33:04 EST


Hi,

On Fri, Sep 23, 2022 at 5:05 PM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
>
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Provide a special list iterator macro for KGDB to allow unprotected list
> walks and add a few comments to explain the hope based approach.
>
> Preperatory change for changing the console list to hlist and adding

s/Preperatory/Preparatory

> lockdep asserts to regular list walks.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> Reviewed-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
> ---
> drivers/tty/serial/kgdboc.c | 5 ++++-
> include/linux/console.h | 10 ++++++++++
> kernel/debug/kdb/kdb_io.c | 7 ++++++-
> 3 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index af2aa76bae15..57a5fd27dffe 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -462,10 +462,13 @@ static void kgdboc_earlycon_pre_exp_handler(void)
> * we have no other choice so we keep using it. Since not all
> * serial drivers might be OK with this, print a warning once per
> * boot if we detect this case.
> + *
> + * Pretend that walking the console list is safe...

To be fair, this is not quite as unsafe as your comment makes it
sound. kgdb is a "stop the world" debugger and when this function is
executing then all of the other CPUs in the system should have been
rounded up and idle (or, perhaps, busy looping). Essentially as long
as console list manipulation is always made in a way that each
instruction keeps the list in a reasonable state then what kgdb is
doing is actually "safe". Said another way: we could drop into the
debugger at any point when a task is manipulating the console list,
but once we're in the debugger and are executing the "pre_exp_handler"
then all the other CPUs have been frozen in time.


> */
> - for_each_console(con)
> + for_each_console_kgdb(con) {
> if (con == kgdboc_earlycon_io_ops.cons)
> return;
> + }
>
> already_warned = true;
> pr_warn("kgdboc_earlycon is still using bootconsole\n");
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 24344f9b0bc1..86a6125512b9 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -187,6 +187,16 @@ extern void console_list_unlock(void) __releases(console_mutex);
> #define for_each_console(con) \
> for (con = console_drivers; con != NULL; con = con->next)
>
> +/**
> + * for_each_console_kgdb() - Iterator over registered consoles for KGDB
> + * @con: struct console pointer used as loop cursor
> + *
> + * Has no serialization requirements and KGDB pretends that this is safe.
> + * Don't use outside of the KGDB fairy tale land!
> + */
> +#define for_each_console_kgdb(con) \
> + for (con = console_drivers; con != NULL; con = con->next)
> +
> extern int console_set_on_cmdline;
> extern struct console *early_console;
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index 67d3c48a1522..fb3775e61a3b 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -558,7 +558,12 @@ static void kdb_msg_write(const char *msg, int msg_len)
> cp++;
> }
>
> - for_each_console(c) {
> + /*
> + * This is a completely unprotected list walk designed by the
> + * wishful thinking department. See the oops_in_progress comment
> + * below - especially the encourage section...

The reality is also a little less dire here than the comment suggests.
IMO this is actually not the same as the "oops_in_progress" case that
the comment refers to.

Specifically, the "oops_in_progress" is referring to the fact that
it's not uncommon to drop into the debugger when a serial driver (the
same one you're using for kgdb) is holding its lock. Possibly it's
printing something to the tty running on the UART dumping stuff out
from the kernel's console. That's not great and I won't pretend that
the kgdb design is amazing here, but...

Just like above, I don't feel like iterating through the console list
here without holding the lock is necessarily unsafe. Just like above,
all the rest of the CPUs in the system are in a holding pattern and
aren't actively executing any code. While we may have interrupted them
at any given instruction, they won't execute any more instruction
until we leave kgdb and resume running.