Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes

From: Mel Gorman
Date: Mon Sep 04 2006 - 11:33:16 EST


On (31/08/06 20:08), Keith Mannthey didst pronounce:
> >So, do you actally expect a lot of unused mem_map to be allocated with
> >struct pages that are inactive until memory is hot-added in an
> >x86_64-specific manner? The arch-independent stuff currently will not do
> >that. It sets up memmap for where memory really exists. If that is not
> >what you expect, it will hit issues at hotadd time which is not the
> >current issue but one that can be fixed.
>
> Yes. RESERVED based is a big waste of mem_map space. The add areas
> are marked as RESERVED during boot and then later onlined during add.
> It might be ok. I will play with tomorrow. I might just need to
> call add_active_range in the right spot :)
>

Following this mail should be two patches that may address the problem
with reserved memory hot-add. One assumption made by arch-independent
zone-sizing was that the only memory holes of interest were those before the
end of physical memory. Another assumption was that mem_map should only be
allocated for memory that was physically present in the machine.

With MEMORY_HOTPLUG_RESERVE on x86_64, these assumptions do not hold. This
feature expects that mem_map is allocated at boot time and later activated
on a memory hot-add event. To determine if the region is usable for hot-add
in the future, holes are calculated beyond the end of physical memory.

The following two patches fix these two assumptions. They have been boot-tested
on a range of hardware (x86, ppc64, ia64 and x86_64) so there should be no
new regressions.

I don't have access to hardware that can use MEMORY_HOTPLUG_RESERVE so I'd
appreciate hearing if the patches work. I wrote a test program that simulated
the input from the machine the problem was reported on. It registers active
memory and simulates the check made by reserve_hotadd(). push_node_boundaries()
is called to push the end of the node out by 100 pages like what SRAT would
do for reserve hot-add and it appears to do the right thing. Output is below.

mel@arnold:~/patches/brokenout/zonesizing/driver_test$ gcc driver_test.c -o
driver_test && ./driver_test | grep -v "active with" | grep -v
account_node_boundary
Stage 1: Registering active ranges
Entering add_active_range(0, 0, 152) 0 entries of 96 used
Entering add_active_range(0, 256, 524165) 1 entries of 96 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 96 used
Entering add_active_range(1, 17235968, 18219008) 3 entries of 96 used

Dumping active map
0: 0 0 -> 152
1: 0 256 -> 524165
2: 0 1048576 -> 4653056
3: 1 17235968 -> 18219008
Entering push_node_boundaries(0, 0, 4653156)

Checking reserve-hotadd
absent_pages_in_range(4653056, 17235968) == 17235968 - 4653056 == 12582912
absent_pages_in_range(18219008, 52428800) == 52428800 - 18219008 == 34209792

Stage 2: Calculating zone sizes and holes

Stage 3: Dumping zone sizes and holes
zone_size[0][0] = 4096 zone_holes[0][0] = 104
zone_size[0][1] = 1044480 zone_holes[0][1] = 524411
zone_size[0][2] = 3604580 zone_holes[0][2] = 100
zone_size[1][2] = 983040 zone_holes[1][2] = 0

Stage 4: Printing present pages
On node 0, 4128541 pages
zone 0 present_pages = 3992
zone 1 present_pages = 520069
zone 2 present_pages = 3604480
On node 1, 983040 pages
zone 2 present_pages = 983040

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/