Re: flush was: Re: [PATCH printk v2 22/26] printk: nbcon: Implement emergency sections

From: Petr Mladek
Date: Fri Mar 01 2024 - 11:12:21 EST


On Fri 2024-03-01 16:49:54, Petr Mladek wrote:
> On Sun 2024-02-18 20:03:22, John Ogness wrote:
> > From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> >
> > In emergency situations (something has gone wrong but the
> > system continues to operate), usually important information
> > (such as a backtrace) is generated via printk(). Each
> > individual printk record has little meaning. It is the
> > collection of printk messages that is most often needed by
> > developers and users.
> >
> > In order to help ensure that the collection of printk messages
> > in an emergency situation are all stored to the ringbuffer as
> > quickly as possible, disable console output for that CPU while
> > it is in the emergency situation. When exiting the emergency
> > situation, trigger the consoles to be flushed.
> >
> > Add per-CPU emergency nesting tracking because an emergency
> > can arise while in an emergency situation.
> >
> > Add functions to mark the beginning and end of emergency
> > sections where the urgent messages are generated.
> >
> > Do not print if the current CPU is in an emergency state.
> >
> > Trigger console flushing when exiting all emergency nesting.
> >
> > Note that the emergency state is not system-wide. While one CPU
> > is in an emergency state, another CPU may continue to print
> > console messages.
> >
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1105,6 +1134,58 @@ void nbcon_atomic_flush_unsafe(void)
> > __nbcon_atomic_flush_all(prb_next_reserve_seq(prb), true);
> > }
> >
> > +/**
> > + * nbcon_cpu_emergency_enter - Enter an emergency section where printk()
> > + * messages for that CPU are only stored
> > + *
> > + * Upon exiting the emergency section, all stored messages are flushed.
> > + *
> > + * Context: Any context. Disables preemption.
> > + *
> > + * When within an emergency section, no printing occurs on that CPU. This
> > + * is to allow all emergency messages to be dumped into the ringbuffer before
> > + * flushing the ringbuffer. The actual printing occurs when exiting the
> > + * outermost emergency section.
> > + */
> > +void nbcon_cpu_emergency_enter(void)
> > +{
> > + unsigned int *cpu_emergency_nesting;
> > +
> > + preempt_disable();
> > +
> > + cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> > + (*cpu_emergency_nesting)++;
> > +}
> > +
> > +/**
> > + * nbcon_cpu_emergency_exit - Exit an emergency section and flush the
> > + * stored messages
> > + *
> > + * Flushing only occurs when exiting all nesting for the CPU.
> > + *
> > + * Context: Any context. Enables preemption.
> > + */
> > +void nbcon_cpu_emergency_exit(void)
> > +{
> > + unsigned int *cpu_emergency_nesting;
> > + bool do_trigger_flush = false;
> > +
> > + cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> > +
> > + WARN_ON_ONCE(*cpu_emergency_nesting == 0);
> > +
> > + if (*cpu_emergency_nesting == 1)
> > + do_trigger_flush = true;
> > +
> > + /* Undo the nesting count of nbcon_cpu_emergency_enter(). */
> > + (*cpu_emergency_nesting)--;
> > +
> > + preempt_enable();
> > +
> > + if (do_trigger_flush)
> > + printk_trigger_flush();
>
> Just an idea. printk_trigger_flush() calls defer_console_output().
> It always moves the flushing into IRQ context.
>
> It might make sense to add a flush() function which would try
> to flush the messages directly from this context and
> queue the IRQ work only when it fails. Something like:
>
> void printk_emergency_flush()
> {
> /* nbcon consoles could be flushed in any context. */
> if (have_nbcon_console)
> nbcon_flush_all();

It seems that the previous version of the patchset called this
in nbcon_cpu_emergency_exit() before decrementing the cpu state.

Sigh, it was me who said that it was useless, see
https://lore.kernel.org/all/ZQ3R4Lz1LHQYsylw@alley/

I am sorry for this.

To my defense. It was a generic function added early in the patchset.
And it was used by both emergency and panic contexts. It is possible
that I did not know about the emergency context behavior at
the time of review.


> if (have_legacy_console) {
> if (console_trylock())
> console_unlock();
> else
> defer_console_output();
> }
> }
>
> But wait, nbcon_flush_all() might have troubles to get the per-console
> lock because it would be called with NBCON_PRIO_NORMAL.
>
>
> Wait, wait, wait.
>
> defer_console_output() schedules wake_up_klogd_work_func(). It flushes
> only legacy consoles. It means that even emergency messages would
> need to wait for the printk kthread.
>
> By other words, it seems that the emergency context does not have any
> effect for nbcon consoles.

It should get fixed by calling nbcon_flush_all() before decrementing
cpu_emergency_nesting counter.

Best Regards,
Petr