[RFC 1/2] workqueue: use the nearest NUMA node, not the local one

From: Nishanth Aravamudan
Date: Thu Jul 17 2014 - 19:10:19 EST


In the presence of memoryless nodes, the workqueue code incorrectly uses
cpu_to_node() to determine what node to prefer memory allocations come
from. cpu_to_mem() should be used instead, which will use the nearest
NUMA node with memory.

Signed-off-by: Nishanth Aravamudan <nacc@xxxxxxxxxxxxxxxxxx>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 35974ac..0bba022 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3547,7 +3547,12 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
wq_numa_possible_cpumask[node])) {
- pool->node = node;
+ /*
+ * We could use local_memory_node(node) here,
+ * but it is expensive and the following caches
+ * the same value.
+ */
+ pool->node = cpu_to_mem(cpumask_first(pool->attrs->cpumask));
break;
}
}
@@ -4921,7 +4926,7 @@ static int __init init_workqueues(void)
pool->cpu = cpu;
cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
pool->attrs->nice = std_nice[i++];
- pool->node = cpu_to_node(cpu);
+ pool->node = cpu_to_mem(cpu);

/* alloc pool ID */
mutex_lock(&wq_pool_mutex);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/