Re: [PATCH 2/2] sched: Plug race between SCA, hotplug and migration_cpu_stop()

From: Valentin Schneider
Date: Tue Jun 01 2021 - 13:00:03 EST


On 26/05/21 21:57, Valentin Schneider wrote:
> + dest_cpu = arg->dest_cpu;
> + if (task_on_rq_queued(p)) {
> + /*
> + * A hotplug operation could have happened between
> + * set_cpus_allowed_ptr() and here, making dest_cpu no
> + * longer allowed.
> + */
> + if (!is_cpu_allowed(p, dest_cpu))
> + dest_cpu = select_fallback_rq(cpu_of(rq), p);
> + /*
> + * dest_cpu can be victim of hotplug between is_cpu_allowed()
> + * and here. However, per the synchronize_rcu() in
> + * sched_cpu_deactivate(), it can't have gone lower than
> + * CPUHP_AP_ACTIVE, so it's safe to punt it over and let
> + * balance_push() route it elsewhere.
> + */
> + update_rq_clock(rq);
> + rq = move_queued_task(rq, &rf, p, dest_cpu);

So, while digesting this I started having doubts vs pcpu kthreads since
they're allowed on online CPUs. The bogus scenario here would be picking a
!active && online CPU, and see it go !online before the move_queued_task().

Now, to transition from online -> !online, we have to go through
take_cpu_down() which is issued via a stop_machine() call. This means the
transition can't happen until all online CPUs are running the stopper task
and reach MULTI_STOP_RUN.

migration_cpu_stop() being already a stopper callback should thus make it
"atomic" vs takedown_cpu(), meaning the above should be fine.

> + } else {
> + p->wake_cpu = dest_cpu;
> + }
> } else if (pending) {
> /*
> * This happens when we get migrated between migrate_enable()'s
> --
> 2.25.1