Re: Removal of printk safe buffers delays NMI context printk

From: John Ogness
Date: Fri Nov 05 2021 - 12:44:53 EST


On 2021-11-05, Petr Mladek <pmladek@xxxxxxxx> wrote:
> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>> On 2021-11-05, Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>>> but we do need that printk flush capability back there and for
>>> nmi_backtrace.
>>
>> Agreed. I had not considered this necessary side-effect when I
>> removed the NMI safe buffers.
>
> Honestly, I do not understand why it stopped working or how
> it worked before.

IIUC, Nick is presenting a problem where a lockup on the other CPUs is
detected. Those CPUs will dump their backtraces per NMI context. But in
their lockup state the irq_work for those CPUs is not functional. So
even though the messages are in the buffer, there is no one printing the
buffer.

printk_safe_flush() would dump the NMI safe buffers for all the CPUs
into the printk buffer, then trigger an irq_work on itself (the
non-locked-up CPU).

That irq_work trigger was critical, because the other CPUs (which also
triggered irq_works for themselves) aren't able to process irq_works. I
did not consider this case. Which is why we still need to trigger
irq_work here. (Or, as the removed comment hinted at, add some printk()
call to either directly print or trigger the irq_work.)

John Ogness