Re: [PATCH printk v2 4/4] printk: Ignore waiter on panic

From: Petr Mladek
Date: Wed Oct 18 2023 - 05:56:56 EST


On Fri 2023-10-13 22:49:40, John Ogness wrote:
> Commit d51507098ff91 ("printk: disable optimistic spin during
> panic") added checks to avoid becoming a console waiter if a
> panic is in progress. However, the transition to panic can occur
> while there is already a waiter. If the panic occurred in a
> context that does not support printing from the printk() caller
> context, the waiter CPU may become stopped while still stored as
> @console_waiter.

I guess that "context that does not support printing" is NMI
or printk_safe when the console handling is deferred().

Another scenario is when the current owner gets stopped
or blocked so that it actually can't pass the lock to
the waiter.

> the panic CPU will see @console_waiter and handover to the
> stopped CPU.
>
> Here a simple example:
>
> CPU0 CPU1
> ---- ----
> console_lock_spinning_enable()
> console_trylock_spinning()
> [set as console waiter]
> NMI: panic()
> panic_other_cpus_shutdown()
> [stopped as console waiter]
> console_flush_on_panic()
> console_lock_spinning_enable()
> [print 1 record]
> console_lock_spinning_disable_and_check()
> [handover to stopped CPU1]
>
> This results in panic() not flushing the panic messages.

Great catch!

> Fix this by ignoring any console waiter if the panic CPU is
> printing.
>
> Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> ---
> kernel/printk/printk.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 56d9b4acbbf2..cd6493f12970 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1904,7 +1904,8 @@ static int console_lock_spinning_disable_and_check(int cookie)
> console_owner = NULL;
> raw_spin_unlock(&console_owner_lock);
>
> - if (!waiter) {
> + /* Waiters are ignored by the panic CPU. */
> + if (!waiter || this_cpu_in_panic()) {
> spin_release(&console_owner_dep_map, _THIS_IP_);
> return 0;
> }

This seems to work.

Well, I have spent some time thinking about possible scenarios and I would
go even further. I would block also console_lock_spinning_enable()
Also I would do the checks before taking console_owner_lock.

It would make the behavior symmetric. And more importantly, it would prevent
possible deadlocks caused by console_owner_lock.

See the proposed patch below. I created it to see the changes in the
code. Also I added info about possible scenarios and effects.

Here we go: