Re: [PATCH RFC] sched,numa: move tasks to preferred_node at wakeup time

From: Peter Zijlstra
Date: Fri May 16 2014 - 09:55:00 EST


On Fri, May 16, 2014 at 02:14:50AM -0400, Rik van Riel wrote:
> +#ifdef CONFIG_NUMA_BALANCING
> +static int numa_balance_on_wake(struct task_struct *p, int prev_cpu)
> +{
> + long load, src_load, dst_load;

> + int cur_node = cpu_to_node(prev_cpu);
> + struct numa_group *numa_group = ACCESS_ONCE(p->numa_group);
> + struct sched_domain *sd;
> + struct task_numa_env env = {
> + .p = p,
> + .best_task = NULL,
> + .best_imp = 0,
> + .best_cpu = -1
> + };

That's all code, ideally you'd move that after we're done checking the
reasons to not do work, say somehere like...

> +
> + if (!sched_feat(NUMA))
> + return prev_cpu;

Yah.. :-( I think some people changed that to numabalancing_enabled.

Fixing that is still on the todo list somewhere.

> +
> + if (p->numa_preferred_nid == -1)
> + return prev_cpu;
> +
> + if (p->numa_preferred_nid == cur_node);
> + return prev_cpu;
> +
> + if (numa_group && node_isset(cur_node, numa_group->active_nodes))
> + return prev_cpu;
> +
> + sd = rcu_dereference(per_cpu(sd_numa, env.src_cpu));
> + if (sd)
> + env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2;
> +
> + /*
> + * Cpusets can break the scheduler domain tree into smaller
> + * balance domains, some of which do not cross NUMA boundaries.
> + * Tasks that are "trapped" in such domains cannot be migrated
> + * elsewhere, so there is no point in (re)trying.
> + */
> + if (unlikely(!sd)) {

How about you bail early, and then have the above test evaporate?

> + p->numa_preferred_nid = cur_node;
> + return prev_cpu;
> + }

.. here.

> +
> + /*
> + * Only allow p to move back to its preferred nid if
> + * that does not create an imbalance that would cause
> + * the load balancer to move a task around later.
> + */
> + env.src_nid = cur_node;
> + env.dst_nid = p->numa_preferred_nid;
> +
> + update_numa_stats(&env.src_stats, env.src_nid);
> + update_numa_stats(&env.dst_stats, env.dst_nid);
> +
> + dst_load = env.dst_stats.load;
> + src_load = env.src_stats.load;
> +
> + /* XXX missing power terms */
> + load = task_h_load(p);
> + dst_load += load;
> + src_load -= load;
> +
> + if (load_too_imbalanced(env.src_stats.load, env.dst_stats.load,
> + src_load, dst_load, &env))
> + return prev_cpu;

So I'm thinking that load_too_imbalanced() is from another patch I
haven't yet seen, lemme go see if you did send it and I missed it.

> +
> + return cpumask_first(cpumask_of_node(p->numa_preferred_nid));
> +}

Attachment: pgpbIYp7AsgXG.pgp
Description: PGP signature