Re: [PATCH printk v1 16/18] kernel/panic: Add atomic write enforcement to warn/panic

From: Petr Mladek
Date: Thu Apr 13 2023 - 06:08:15 EST


On Thu 2023-03-02 21:02:16, John Ogness wrote:
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Invoke the atomic write enforcement functions for warn/panic to
> ensure that the information gets out to the consoles.
>
> For the panic case, add explicit intermediate atomic flush calls to
> ensure immediate flushing at important points. Otherwise the atomic
> flushing only occurs when dropping out of the elevated priority,
> which for panic may never happen.
>
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of the
> atomic consoles. Optimally there should be no legacy consoles
> registered.
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -329,6 +332,8 @@ void panic(const char *fmt, ...)
> if (_crash_kexec_post_notifiers)
> __crash_kexec(NULL);
>
> + cons_atomic_flush(NULL, true);

Do we need to explicitly flush the messages here?

cons_atomic_flush() is called also from vprintk_emit(). And there are
many messages printed with the PANIC priority above.

This makes an assumption that either printk() in PANIC context
does not try to show the messages immediately or that this
explicit console_atomic_flush() tries harder. I think
that both assumptions are wrong.

> +
> console_unblank();
>
> /*
> @@ -353,6 +358,7 @@ void panic(const char *fmt, ...)
> * We can't use the "normal" timers since we just panicked.
> */
> pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
> + cons_atomic_flush(NULL, true);

Same here.
>
> for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
> touch_nmi_watchdog();
> @@ -371,6 +377,7 @@ void panic(const char *fmt, ...)
> */
> if (panic_reboot_mode != REBOOT_UNDEFINED)
> reboot_mode = panic_reboot_mode;
> + cons_atomic_flush(NULL, true);

And here.

> emergency_restart();
> }
> #ifdef __sparc__
> @@ -383,12 +390,16 @@ void panic(const char *fmt, ...)
> }
> #endif
> #if defined(CONFIG_S390)
> + cons_atomic_flush(NULL, true);

And here.

> disabled_wait();
> #endif
> pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
>
> /* Do not scroll important messages printed above */
> suppress_printk = 1;
> +
> + cons_atomic_exit(CONS_PRIO_PANIC, prev_prio);

On the contrary, I would explicitly call cons_atomic_flush(NULL, false)
here instead of hiding it in cons_atomic_exit().

It would make it clear that this is the POINT where panic() tries
harder to get the messages out on NOBKL consoles.

> +
> local_irq_enable();
> for (i = 0; ; i += PANIC_TIMER_STEP) {
> touch_softlockup_watchdog();
> @@ -599,6 +610,10 @@ struct warn_args {
> void __warn(const char *file, int line, void *caller, unsigned taint,
> struct pt_regs *regs, struct warn_args *args)
> {
> + enum cons_prio prev_prio;
> +
> + prev_prio = cons_atomic_enter(CONS_PRIO_EMERGENCY);
> +
> disable_trace_on_warning();
>
> if (file)
> @@ -630,6 +645,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
>
> /* Just a warning, don't kill lockdep. */
> add_taint(taint, LOCKDEP_STILL_OK);
> +
> + cons_atomic_exit(CONS_PRIO_EMERGENCY, prev_prio);
> }

I would more this into separate patch and keep this one only for the
PANIC handling.

Also I think that we want to set the EMERGENCY prio also in oops_enter()?

Best Regards,
Petr