Re: [PATCH 2/6] sched: Simplify migration_cpu_stop()

From: Peter Zijlstra
Date: Thu Feb 25 2021 - 03:47:15 EST


On Wed, Feb 24, 2021 at 03:34:36PM +0000, Valentin Schneider wrote:
> On 24/02/21 13:24, Peter Zijlstra wrote:
> > @@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data
> > goto out;
> >
> > if (pending) {
> > - p->migration_pending = NULL;
> > + if (p->migration_pending == pending)
> > + p->migration_pending = NULL;
> > complete = true;
> > }
> >
> > - /* migrate_enable() -- we must not race against SCA */
> > - if (dest_cpu < 0) {
> > - /*
> > - * When this was migrate_enable() but we no longer
> > - * have a @pending, a concurrent SCA 'fixed' things
> > - * and we should be valid again. Nothing to do.
> > - */
> > - if (!pending) {
> > - WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
> > - goto out;
> > - }
> > -
>
> This is fixed by 5+6, but at this patch I think you can have double
> completions - I thought this was an issue, but briefly looking at
> completion stuff it might not. In any case, consider:
>
> task_cpu(p) == Y
>
> SCA(p, X);
> SCA(p, Y);
>
>
> SCA(p, Y) will uninstall SCA(p, X)'s pending and complete.
>
> migration/Y kicked by SCA(p, X) will grab arg->pending, which is still
> SCA(p, X)'s pending and also complete.

Right, so I didn't really think too hard about the intermediate states,
given it's all pretty buggered until at least 5. But yeah, double
complete is harmless.

Specifically, the refcount the stopper has should avoid the stack from
getting released.