Re: [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to the same node.

From: Peter Zijlstra
Date: Mon Jul 23 2018 - 06:38:48 EST


On Wed, Jun 20, 2018 at 10:32:52PM +0530, Srikar Dronamraju wrote:
> Since task migration under numa balancing can happen in parallel, more
> than one task might choose to move to the same node at the same time.
> This can cause load imbalances at the node level.
>
> The problem is more likely if there are more cores per node or more
> nodes in system.
>
> Use a per-node variable to indicate if task migration
> to the node under numa balance is currently active.
> This per-node variable will not track swapping of tasks.


> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 50c7727..87fb20e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1478,11 +1478,22 @@ struct task_numa_env {
> static void task_numa_assign(struct task_numa_env *env,
> struct task_struct *p, long imp)
> {
> + pg_data_t *pgdat = NODE_DATA(cpu_to_node(env->dst_cpu));
> struct rq *rq = cpu_rq(env->dst_cpu);
>
> if (xchg(&rq->numa_migrate_on, 1))
> return;
>
> + if (!env->best_task && env->best_cpu != -1)
> + WRITE_ONCE(pgdat->active_node_migrate, 0);
> +
> + if (!p) {
> + if (xchg(&pgdat->active_node_migrate, 1)) {
> + WRITE_ONCE(rq->numa_migrate_on, 0);
> + return;
> + }
> + }
> +
> if (env->best_cpu != -1) {
> rq = cpu_rq(env->best_cpu);
> WRITE_ONCE(rq->numa_migrate_on, 0);


Urgh, that's prertty magical code. And it doesn't even have a comment.

For isntance, I cannot tell why we clear that active_node_migrate thing
right there.