Re: [PATCH printk v2 10/11] rcu: Add atomic write enforcement for rcu stalls

From: Petr Mladek
Date: Wed Sep 27 2023 - 11:00:33 EST


On Wed 2023-09-20 01:14:55, John Ogness wrote:
> Invoke the atomic write enforcement functions for rcu stalls to
> ensure that the information gets out to the consoles.
>
> It is important to note that if there are any legacy consoles
> registered, they will be attempting to directly print from the
> printk-caller context, which may jeopardize the reliability of
> the atomic consoles. Optimally there should be no legacy
> consoles registered.
>
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> ---
> kernel/rcu/tree_stall.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 6f06dc12904a..0a58f8b233d8 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -8,6 +8,7 @@
> */
>
> #include <linux/kvm_para.h>
> +#include <linux/console.h>
>
> //////////////////////////////////////////////////////////////////////////////
> //
> @@ -582,6 +583,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
>
> static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> {
> + enum nbcon_prio prev_prio;
> int cpu;
> unsigned long flags;
> unsigned long gpa;
> @@ -597,6 +599,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> if (rcu_stall_is_suppressed())
> return;
>
> + prev_prio = nbcon_atomic_enter(NBCON_PRIO_EMERGENCY);
> +
> /*
> * OK, time to rat on our buddy...
> * See Documentation/RCU/stallwarn.rst for info on how to debug
> @@ -651,6 +655,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> panic_on_rcu_stall();
>
> rcu_force_quiescent_state(); /* Kick them all. */
> +
> + nbcon_atomic_exit(NBCON_PRIO_EMERGENCY, prev_prio);

The locations looks reasonable to me. I just hope that we would
use another API: nbcon_emergency_enter()/exit() in the end.

Note that the new API it would allow to flush the messages in
the emergency context immediately from printk().

In that case, we would to handle nmi_trigger_cpumask_backtrace()
some special way.

This function would be called from the emergency context but
the nmi_cpu_backtrace() callbacks would be called on other
CPUs in normal context.

For this case I would add something like:

void nbcon_flush_all_emergency(void)
{
emum nbcon_prio = nbcon_get_default_prio();

if (nbcon_prio >= NBCON_PRIO_EMERGENCY)
nbcon_flush_all();
}

, where the POC of nbcon_get_default_prio() and nbcon_flush_all()
was in the replay to the 7th patch, see
https://lore.kernel.org/all/ZRLBxsXPCym2NC5Q@alley/


Best Regards,
Petr