Re: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

From: Joel Fernandes
Date: Fri Dec 02 2022 - 08:25:31 EST




> On Dec 2, 2022, at 7:52 AM, Zhang, Qiang1 <qiang1.zhang@xxxxxxxxx> wrote:
>
> On Thu, Dec 01, 2022 at 07:45:33AM +0800, Zqiang wrote:
>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>> directly, indicates the end of the rude RCU-task grace period.
>> suppose the system has two cpus, consider the following scenario:
>>
>> CPU0 CPU1 (going offline)
>> migration/1 task:
>> cpu_stopper_thread
>> -> take_cpu_down
>> -> _cpu_disable
>> (dec __num_online_cpus)
>> ->cpuhp_invoke_callback
>> preempt_disable
>> access old_data0
>> task1
>> del old_data0 .....
>> synchronize_rcu_tasks_rude()
>> task1 schedule out
>> ....
>> task2 schedule in
>> rcu_tasks_rude_wait_gp()
>> ->__num_online_cpus == 1
>> ->return
>> ....
>> task1 schedule in
>> ->free old_data0
>> preempt_enable
>>
>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>> the CPU1 has not finished offline, stop_machine task(migration/1)
>> still running on CPU1, maybe still accessing 'old_data0', but the
>> 'old_data0' has freed on CPU0.
>>
>> In order to prevent the above scenario from happening, this commit
>> remove check for __num_online_cpus == 0 and add handling of calling
>> synchronize_rcu_tasks_generic() during early boot(when the
>> rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE, the scheduler
>> not yet initialized and only one boot-CPU online).
>>
>> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
>
>> Very good, thank you! I did the usual wordsmithing, including to that
>> error message, so as usual please check to make sure that I didn't mess
>> something up.
>>
>> Thanx, Paul
>>
>> ------------------------------------------------------------------------
>>
>> commit 033ddc5d337984e20b9d49c8af4faa4689727626
>> Author: Zqiang <qiang1.zhang@xxxxxxxxx>
>> Date: Thu Dec 1 07:45:33 2022 +0800
>>
>> rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
>>
>> The synchronize_rcu_tasks_rude() function invokes rcu_tasks_rude_wait_gp()
>> to wait one rude RCU-tasks grace period. The rcu_tasks_rude_wait_gp()
>> function in turn checks if there is only a single online CPU. If so, it
>> will immediately return, because a call to synchronize_rcu_tasks_rude()
>> is by definition a grace period on a single-CPU system. (We could
>> have blocked!)
>>
>> Unfortunately, this check uses num_online_cpus() without synchronization,
>> which can result in too-short grace periods. To see this, consider the
>> following scenario:
>>
>> CPU0 CPU1 (going offline)
>> migration/1 task:
>> cpu_stopper_thread
>> -> take_cpu_down
>> -> _cpu_disable
>> (dec __num_online_cpus)
>> ->cpuhp_invoke_callback
>> preempt_disable
>> access old_data0
>> task1
>> del old_data0 .....
>> synchronize_rcu_tasks_rude()
>> task1 schedule out
>> ....
>> task2 schedule in
>> rcu_tasks_rude_wait_gp()
>> ->__num_online_cpus == 1
>> ->return
>> ....
>> task1 schedule in
>> ->free old_data0
>> preempt_enable
>>
>> When CPU1 decrements __num_online_cpus, its value becomes 1. However,
>> CPU1 has not finished going offline, and will take one last trip through
>> the scheduler and the idle loop before it actually stops executing
>> instructions. Because synchronize_rcu_tasks_rude() is mostly used for
>> tracing, and because both the scheduler and the idle loop can be traced,
>> this means that CPU0's prematurely ended grace period might disrupt the
>> tracing on CPU1. Given that this disruption might include CPU1 executing
>> instructions in memory that was just now freed (and maybe reallocated),
>> this is a matter of some concern.
>>
>> This commit therefore removes that problematic single-CPU check from the
>> rcu_tasks_rude_wait_gp() function. This dispenses with the single-CPU
>> optimization, but there is no evidence indicating that this optimization
>> is important. In addition, synchronize_rcu_tasks_generic() contains a
>> similar optimization (albeit only for early boot), which also splats.
>> (As in exactly why are you invoking synchronize_rcu_tasks_rude() so
>> early in boot, anyway???)
>>
>> It is OK for the synchronize_rcu_tasks_rude() function's check to be
>> unsynchronized because the only times that this check can evaluate to
>> true is when there is only a single CPU running with preemption
>> disabled.
>>
>> While in the area, this commit also fixes a minor bug in which a
>> call to synchronize_rcu_tasks_rude() would instead be attributed to
>> synchronize_rcu_tasks().
>>
>> [ paulmck: Add "synchronize_" prefix and "()" suffix. ]
>>
>> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
>> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>>
>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>> index 4dda8e6e5707f..d845723c1af41 100644
>> --- a/kernel/rcu/tasks.h
>> +++ b/kernel/rcu/tasks.h
>> @@ -560,8 +560,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>> static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
>> {
>> /* Complain if the scheduler has not started. */
>> - WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>> - "synchronize_rcu_tasks called too soon");
>> + if (WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>> + "synchronize_%s() called too soon", rtp->name))
>
> Thanks Paul, detailed description and modification 😊.

True statement. Good example for everyone ;-)

- Joel


>
>
>> + return;
>>
>> // If the grace-period kthread is running, use it.
>> if (READ_ONCE(rtp->kthread_ptr)) {
>> @@ -1064,9 +1065,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>> // Wait for one rude RCU-tasks grace period.
>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>> {
>> - if (num_online_cpus() <= 1)
>> - return; // Fastpath for only one CPU.
>> -
>> rtp->n_ipis += cpumask_weight(cpu_online_mask);
>> schedule_on_each_cpu(rcu_tasks_be_rude);
>> }