Re: [PATCH 8/9] sched: Fix migrate_disable() vs set_cpus_allowed_ptr()

From: Peter Zijlstra
Date: Fri Sep 25 2020 - 05:56:35 EST


On Fri, Sep 25, 2020 at 11:05:28AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 24, 2020 at 08:59:33PM +0100, Valentin Schneider wrote:
> > > @@ -2025,19 +2138,8 @@ static int __set_cpus_allowed_ptr(struct
> > > if (cpumask_test_cpu(task_cpu(p), new_mask))
> > > goto out;
> >
> > I think this needs a cancellation of any potential pending migration
> > requests. Consider a task P0 running on CPU0:
> >
> > P0 P1 P2
> >
> > migrate_disable();
> > <preempt>
> > set_cpus_allowed_ptr(P0, CPU1);
> > // waits for completion
> > set_cpus_allowed_ptr(P0, CPU0);
> > // Already good, no waiting for completion
> > <resumes>
> > migrate_enable();
> > // task_cpu(p) allowed, no move_task()
> >
> > AIUI in this scenario P1 would stay forever waiting.
>

> The other approach is trying to handle that last condition in
> move_task(), but I'm quite sure that's going to be aweful too :/

Something like so perhaps?

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2039,6 +2039,10 @@ static int move_task(struct rq *rq, stru
if (WARN_ON_ONCE(!pending))
return -EINVAL;

+ /* Can the task run on the task's current CPU? If so, we're done */
+ if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
+ goto easy;
+
arg.done = &pending->done;

if (flags & SCA_MIGRATE_ENABLE) {
@@ -2063,6 +2067,7 @@ static int move_task(struct rq *rq, stru
if (task_on_rq_queued(p))
rq = move_queued_task(rq, rf, p, dest_cpu);

+easy:
p->migration_pending = NULL;
complete = true;
}
@@ -2151,10 +2156,6 @@ static int __set_cpus_allowed_ptr(struct
p->nr_cpus_allowed != 1);
}

- /* Can the task run on the task's current CPU? If so, we're done */
- if (cpumask_test_cpu(task_cpu(p), new_mask))
- goto out;
-
return move_task(rq, &rf, p, dest_cpu, flags);

out: