Re: [PATCH] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

From: Neeraj Upadhyay
Date: Sat Nov 26 2022 - 00:20:58 EST


Hi Zqiang,

On 11/25/2022 9:24 PM, Zqiang wrote:
Currently, for the case of num_online_cpus() <= 1, return directly,
indicates the end of current grace period and then release old data.
it's not accurate, for SMP system, when num_online_cpus() is equal
one, maybe another cpu that in offline process(after invoke
__cpu_disable()) is still in the rude RCU-Tasks critical section
holding the old data, this lead to memory corruption.


Was this race seen in your testing? For the outgoing CPU, once that
CPU marks itself offline (and decrements __num_online_cpus), do we
have tracing active on that CPU, and synchronize_rcu_tasks_rude()
not waiting for it could potentially lead to memory corruption?

As per my understanding, given that outgoing/incoming CPU decrements/increments the __num_online_cpus value, and num_online_cpus()
is a plain read, problem could happen when the incoming CPU updates the
__num_online_cpus value, however, rcu_tasks_rude_wait_gp()'s num_online_cpus() call didn't observe the increment. So, cpus_read_lock/unlock() seems to be required to handle this case.



Thanks
Neeraj

Therefore, this commit add cpus_read_lock/unlock() before executing
num_online_cpus().

Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
---
kernel/rcu/tasks.h | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 4a991311be9b..08e72c6462d8 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
{
}
+static DEFINE_PER_CPU(struct work_struct, rude_work);
+
// Wait for one rude RCU-tasks grace period.
static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
{
+ int cpu;
+ struct work_struct *work;
+
+ cpus_read_lock();
if (num_online_cpus() <= 1)
- return; // Fastpath for only one CPU.
+ goto end;// Fastpath for only one CPU.
rtp->n_ipis += cpumask_weight(cpu_online_mask) > - schedule_on_each_cpu(rcu_tasks_be_rude);
+ for_each_online_cpu(cpu) {
+ work = per_cpu_ptr(&rude_work, cpu);
+ INIT_WORK(work, rcu_tasks_be_rude);
+ schedule_work_on(cpu, work);
+ }
+
+ for_each_online_cpu(cpu)
+ flush_work(per_cpu_ptr(&rude_work, cpu));
+
+end:
+ cpus_read_unlock();
}
void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);