Re: [PATCH printk v2 21/26] printk: Coordinate direct printing in panic

From: Petr Mladek
Date: Fri Mar 01 2024 - 08:05:19 EST


On Sun 2024-02-18 20:03:21, John Ogness wrote:
> Perform printing by nbcon consoles on the panic CPU from the
> printk() caller context in order to get panic messages printed
> as soon as possible.
>
> If legacy and nbcon consoles are registered, the legacy consoles
> will no longer perform direct printing on the panic CPU until
> after the backtrace has been stored. This will give the safe
> nbcon consoles a chance to print the panic messages before
> allowing the unsafe legacy consoles to print.
>
> If no nbcon consoles are registered, there is no change in
> behavior (i.e. legacy consoles will always attempt to print
> from the printk() caller context).

> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -370,6 +370,8 @@ void panic(const char *fmt, ...)
> */
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>
> + printk_legacy_allow_panic_sync();

I would call this before the panic notifiers. They are known
to cause problems.

It was the reason to introduce "crash_kexec_post_notifiers" parameter.
Also there is a patchset which tries to somehow split them
by purpose, see
https://lore.kernel.org/all/20220427224924.592546-23-gpiccoli@xxxxxxxxxx/

It brings another question whether to try flushing the legacy consoles
before calling the notifiers.

> panic_print_sys_info(false);
>
> kmsg_dump(KMSG_DUMP_PANIC);
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2329,12 +2329,23 @@ int vprintk_store(int facility, int level,
> return ret;
> }
>
> +static bool legacy_allow_panic_sync;
> +
> +/*
> + * This acts as a one-way switch to allow legacy consoles to print from
> + * the printk() caller context on a panic CPU.
> + */
> +void printk_legacy_allow_panic_sync(void)
> +{
> + legacy_allow_panic_sync = true;

I would flush the legacy consoles here. Otherwise it might be done
by another random printk() from the notifiers or by the even more
unsafe printk_console_flush_in_panic().

I mean to do something like:

if (printing_via_unlock && console_trylock)
console_unlock();

> +}
> +
> asmlinkage int vprintk_emit(int facility, int level,
> const struct dev_printk_info *dev_info,
> const char *fmt, va_list args)
> {
> + bool do_trylock_unlock = printing_via_unlock;
> int printed_len;
> - bool in_sched = false;
>
> /* Suppress unimportant messages after panic happens */
> if (unlikely(suppress_printk))
> @@ -2350,15 +2361,43 @@ asmlinkage int vprintk_emit(int facility, int level,
>
> if (level == LOGLEVEL_SCHED) {
> level = LOGLEVEL_DEFAULT;
> - in_sched = true;
> + /* If called from the scheduler, we can not call up(). */
> + do_trylock_unlock = false;
> }
>
> printk_delay(level);
>
> printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>
> - /* If called from the scheduler, we can not call up(). */
> - if (!in_sched && printing_via_unlock) {
> + if (!have_boot_console && have_nbcon_console) {

Nit: The opposite order is more logic ;-)

if (have_nbcon_console && !have_boot_console) {

> + bool is_panic_context = this_cpu_in_panic();
> +
> + /*
> + * In panic, the legacy consoles are not allowed to print from
> + * the printk calling context unless explicitly allowed. This
> + * gives the safe nbcon consoles a chance to print out all the
> + * panic messages first. This restriction only applies if
> + * there are nbcon consoles registered.
> + */
> + if (is_panic_context)
> + do_trylock_unlock &= legacy_allow_panic_sync;
> +
> + /*
> + * There are situations where nbcon atomic printing should
> + * happen in the printk() caller context:
> + *
> + * - When this CPU is in panic.
> + *
> + * Note that if boot consoles are registered, the
> + * console_lock/console_unlock dance must be relied upon
> + * instead because nbcon consoles cannot print simultaneously
> + * with boot consoles.
> + */
> + if (is_panic_context)
> + nbcon_atomic_flush_all();
> + }
> +
> + if (do_trylock_unlock) {
> /*
> * The caller may be holding system-critical or
> * timing-sensitive locks. Disable preemption during

Otherwise, it looks good.

Best Regards,
Petr