Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed

From: Steven Rostedt
Date: Fri Nov 17 2017 - 17:49:03 EST


On Wed, 15 Nov 2017 14:25:29 -0500
joe.korty@xxxxxxxxxxxxxxxxx wrote:

> 4.4.86-rt99's patch
>
> 0037-Intrduce-migrate_disable-cpu_light.patch
>
> introduces a place where a task's cpus_allowed mask is
> updated without a corresponding update to nr_cpus_allowed.
>
> This path is executed when task affinity is changed while
> migrate_disabled() is true. As there is no code present
> to set nr_cpus_allowed when the migrate_disable state is
> dropped, the scheduler at that point on may make incorrect
> scheduling decisions for this task.
>
> My testing consists of temporarily adding a
>
> if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
> printk_ratelimited(...)

Have you tested v4.9-rt or 4.13-rt if it has the same bug? If it is a
bug in 4.13-rt then it needs to go there first, and then backported to
the stable releases (which I'm actually working on now).

-- Steve

>
> stmt to schedule() and running a simple affinity rotation
> program I wrote, one that rotates the threads of stress(1).
> While rotating, I got the expected kernel error messages.
> With this patch applied the messages disappeared.
>
> Signed-off-by: Joe Korty <joe.korty@xxxxxxxxxxxxxxxxx>
>
> Index: b/kernel/sched/core.c
> ===================================================================
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
> lockdep_assert_held(&p->pi_lock);
>
> if (__migrate_disabled(p)) {
> + p->nr_cpus_allowed = cpumask_weight(new_mask);
> cpumask_copy(&p->cpus_allowed, new_mask);
> return;
> }