Re: [PATCH] x86: fix system without memory on node0

From: Jack Steiner
Date: Wed May 13 2009 - 12:52:31 EST


On Tue, May 12, 2009 at 06:34:31PM -0700, Yinghai Lu wrote:
>
> Jack found that crash with doesn't have memory on node0.
>
> it turns out with per_cpu changeset, node_number for BSP will be alway 0,
> and it is consistent to cpu_to_node() that is to near node already.
> aka when numa_set_node() for node0 is called early before per_cpu area is
> setup
>
> try to set the node_number for boot cpu, after we get per_cpu area setup.
>
> [ Impact: fix crashing on memoryless node 0]
>
> Reported-by: Jack Steiner <steiner@xxxxxxx>
> Signed-off-by: Yinghai Lu <yinghai@xxxxxxxxxx>
>
> ---
> arch/x86/kernel/setup_percpu.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux-2.6/arch/x86/kernel/setup_percpu.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup_percpu.c
> +++ linux-2.6/arch/x86/kernel/setup_percpu.c
> @@ -423,6 +423,14 @@ void __init setup_per_cpu_areas(void)
> early_per_cpu_ptr(x86_cpu_to_node_map) = NULL;
> #endif
>
> +#if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
> + /*
> + * make sure boot cpu node_number is right, when boot cpu is on the
> + * node that doesn't have mem installed
> + */
> + per_cpu(node_number, boot_cpu_id) = cpu_to_node(boot_cpu_id);
> +#endif
> +
> /* Setup node to cpumask map */
> setup_node_to_cpumask_map();
>

With the patch above PLUS the patch below, I verified that all of our strange
configurations boot to shell prompt & run simple commands. There are certainly
some corner cases that have not been tested.

Note that both patches are required. The system panics in early boot if either
patch is omitted.

---


Ignore offline nodes when building the zone lists. This
fix is needed to support configurations that hax PXMs with
cpus but no memory.


Signed-off-by: Jack Steiner <steiner@xxxxxxx>


---
mm/page_alloc.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c 2009-05-12 17:06:59.000000000 -0500
+++ linux/mm/page_alloc.c 2009-05-13 09:54:09.000000000 -0500
@@ -2370,6 +2370,8 @@ static void build_zonelists(pg_data_t *p
* If another node is sufficiently far away then it is better
* to reclaim pages in a zone before going off node.
*/
+ if (!node_online(node))
+ continue;
if (distance > RECLAIM_DISTANCE)
zone_reclaim_mode = 1;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/