RE: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

From: Zhang, Qiang1
Date: Fri Dec 02 2022 - 07:52:06 EST


On Thu, Dec 01, 2022 at 07:45:33AM +0800, Zqiang wrote:
> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> RCU-tasks grace period, if __num_online_cpus == 1, will return
> directly, indicates the end of the rude RCU-task grace period.
> suppose the system has two cpus, consider the following scenario:
>
> CPU0 CPU1 (going offline)
> migration/1 task:
> cpu_stopper_thread
> -> take_cpu_down
> -> _cpu_disable
> (dec __num_online_cpus)
> ->cpuhp_invoke_callback
> preempt_disable
> access old_data0
> task1
> del old_data0 .....
> synchronize_rcu_tasks_rude()
> task1 schedule out
> ....
> task2 schedule in
> rcu_tasks_rude_wait_gp()
> ->__num_online_cpus == 1
> ->return
> ....
> task1 schedule in
> ->free old_data0
> preempt_enable
>
> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> the CPU1 has not finished offline, stop_machine task(migration/1)
> still running on CPU1, maybe still accessing 'old_data0', but the
> 'old_data0' has freed on CPU0.
>
> In order to prevent the above scenario from happening, this commit
> remove check for __num_online_cpus == 0 and add handling of calling
> synchronize_rcu_tasks_generic() during early boot(when the
> rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE, the scheduler
> not yet initialized and only one boot-CPU online).
>
> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>

>Very good, thank you! I did the usual wordsmithing, including to that
>error message, so as usual please check to make sure that I didn't mess
>something up.
>
> Thanx, Paul
>
>------------------------------------------------------------------------
>
>commit 033ddc5d337984e20b9d49c8af4faa4689727626
>Author: Zqiang <qiang1.zhang@xxxxxxxxx>
>Date: Thu Dec 1 07:45:33 2022 +0800
>
> rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
>
> The synchronize_rcu_tasks_rude() function invokes rcu_tasks_rude_wait_gp()
> to wait one rude RCU-tasks grace period. The rcu_tasks_rude_wait_gp()
> function in turn checks if there is only a single online CPU. If so, it
> will immediately return, because a call to synchronize_rcu_tasks_rude()
> is by definition a grace period on a single-CPU system. (We could
> have blocked!)
>
> Unfortunately, this check uses num_online_cpus() without synchronization,
> which can result in too-short grace periods. To see this, consider the
> following scenario:
>
> CPU0 CPU1 (going offline)
> migration/1 task:
> cpu_stopper_thread
> -> take_cpu_down
> -> _cpu_disable
> (dec __num_online_cpus)
> ->cpuhp_invoke_callback
> preempt_disable
> access old_data0
> task1
> del old_data0 .....
> synchronize_rcu_tasks_rude()
> task1 schedule out
> ....
> task2 schedule in
> rcu_tasks_rude_wait_gp()
> ->__num_online_cpus == 1
> ->return
> ....
> task1 schedule in
> ->free old_data0
> preempt_enable
>
> When CPU1 decrements __num_online_cpus, its value becomes 1. However,
> CPU1 has not finished going offline, and will take one last trip through
> the scheduler and the idle loop before it actually stops executing
> instructions. Because synchronize_rcu_tasks_rude() is mostly used for
> tracing, and because both the scheduler and the idle loop can be traced,
> this means that CPU0's prematurely ended grace period might disrupt the
> tracing on CPU1. Given that this disruption might include CPU1 executing
> instructions in memory that was just now freed (and maybe reallocated),
> this is a matter of some concern.
>
> This commit therefore removes that problematic single-CPU check from the
> rcu_tasks_rude_wait_gp() function. This dispenses with the single-CPU
> optimization, but there is no evidence indicating that this optimization
> is important. In addition, synchronize_rcu_tasks_generic() contains a
> similar optimization (albeit only for early boot), which also splats.
> (As in exactly why are you invoking synchronize_rcu_tasks_rude() so
> early in boot, anyway???)
>
> It is OK for the synchronize_rcu_tasks_rude() function's check to be
> unsynchronized because the only times that this check can evaluate to
> true is when there is only a single CPU running with preemption
> disabled.
>
> While in the area, this commit also fixes a minor bug in which a
> call to synchronize_rcu_tasks_rude() would instead be attributed to
> synchronize_rcu_tasks().
>
> [ paulmck: Add "synchronize_" prefix and "()" suffix. ]
>
> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
>diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>index 4dda8e6e5707f..d845723c1af41 100644
>--- a/kernel/rcu/tasks.h
>+++ b/kernel/rcu/tasks.h
>@@ -560,8 +560,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
> {
> /* Complain if the scheduler has not started. */
>- WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>- "synchronize_rcu_tasks called too soon");
>+ if (WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>+ "synchronize_%s() called too soon", rtp->name))

Thanks Paul, detailed description and modification 😊.


>+ return;
>
> // If the grace-period kthread is running, use it.
> if (READ_ONCE(rtp->kthread_ptr)) {
>@@ -1064,9 +1065,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> // Wait for one rude RCU-tasks grace period.
> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> {
>- if (num_online_cpus() <= 1)
>- return; // Fastpath for only one CPU.
>-
> rtp->n_ipis += cpumask_weight(cpu_online_mask);
> schedule_on_each_cpu(rcu_tasks_be_rude);
> }