Re: [PATCH RFC] hotplug-memory: refactor online_pages to separatezone growth from page onlining

From: Jeremy Fitzhardinge
Date: Fri Mar 28 2008 - 22:09:50 EST

Next message: Jeff Garzik: "[git patches] net driver fixes"
Previous message: Jeff Garzik: "Re: [PATCH 2/2] Blackfin EMAC Driver: delete unused variables tofixup gcc warnings"
In reply to: Dave Hansen: "Re: [PATCH RFC] hotplug-memory: refactor online_pages to separatezone growth from page onlining"
Next in thread: Dave Hansen: "Re: [PATCH RFC] hotplug-memory: refactor online_pages to separatezone growth from page onlining"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dave Hansen wrote:

On Fri, 2008-03-28 at 17:00 -0700, Jeremy Fitzhardinge wrote:

The Xen balloon driver needs to separate the process of hot-installing
memory into two phases: one to allocate the page structures and
configure the zones, and another to actually online the pages of newly
installed memory.

This patch splits up the innards of online_pages() into two pieces which
correspond to these two phases. The behaviour of online_pages() itself
is unchanged.

...

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -180,31 +180,35 @@
return 0;
}

-
-int online_pages(unsigned long pfn, unsigned long nr_pages)
+/* Tell anyone who's interested that we're onlining some memory */
+static int notify_going_online(unsigned long pfn, unsigned long nr_pages)
{
- unsigned long flags;
- unsigned long onlined_pages = 0;
- struct zone *zone;
- int need_zonelists_rebuild = 0;
+ struct memory_notify arg;
int nid;
int ret;
- struct memory_notify arg;

arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
-
+
nid = page_to_nid(pfn_to_page(pfn));
if (node_present_pages(nid) == 0)
arg.status_change_nid = nid;

That's kind a weird line in the patch. How'd that get there? Why are
you moving 'arg'?

arg is the notifier arg. This function is just wrapping up the GOING_ONLINE notifier. The code is just a copy, so it isn't doing anything it wasn't doing before.

The original code also recycles arg for the ONLINE notifier, but that seemed unnecessary.

This look OK, but it does add ~45 lines of code, and I'm not immediately
sure how you're going to use it. Could you address that a bit?

Sure. When the balloon driver wants to increase the domain's size, but it finds its run out of page structures to grow into, it hotplug-adds some memory. This code uses the add_memory_resource() function I posted the patch for yesterday. (Error-checking removed for brevity.)

static void balloon_expand(unsigned pages)
{
struct resource *res;
int ret;
u64 size = (u64)pages * PAGE_SIZE;
unsigned pfn;
unsigned start_pfn, end_pfn;

res = kzalloc(sizeof(*res), GFP_KERNEL);

res->name = "Xen Balloon";
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;

ret = allocate_resource(&iomem_resource, res, size, 0, -1,
1ul << SECTION_SIZE_BITS, NULL, NULL);

start_pfn = res->start >> PAGE_SHIFT;
end_pfn = (res->end + 1) >> PAGE_SHIFT;

ret = add_memory_resource(0, res);
ret = prepare_online_pages(start_pfn, pages);

for(pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_page(pfn);

SetPageReserved(page);
set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
balloon_append(page); /* add to a list of balloon pages */
}
}

So this just gives us some page structures, but there's no underlying memory yet. Later, the balloon driver starts populating the pages with real memory behind them:

for (i = 0; i < nr_pages; i++) {
page = balloon_retrieve();

pfn = page_to_pfn(page);

/* frame_list is set of real memory pages */
set_phys_to_machine(pfn, frame_list[i]);

/* Relinquish the page back to the allocator. */
mark_pages_onlined(pfn, 1);

/* Link back into the page tables if not highmem. */
if (!PageHighMem(page)) {
int ret;
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
mfn_pte(frame_list[i], PAGE_KERNEL),
0);

}
}

I do kinda wish you'd take a real hard look at what the new functions
are doing now. I'm not sure their current names are very good.

You're right. This is very much a first pass.

+/* Grow the zone to fit the expected amount of memory being added */
+static struct zone *online_pages_zone(unsigned long pfn, unsigned long nr_pages)

The comment is good, but the function name is not. :) How about
grow_zone_span() or something?

Sure. I'm not really sure what the bookkeeping its doing really means though.

+/* Mark a set of pages as online */
+unsigned long mark_pages_onlined(unsigned long pfn, unsigned long nr_pages)

Isn't the comment on this one a bit redundant? :)

This looks to me to have become the real online_pages() now. This
function is what goes and individually onlines pages. If someone was
trying to figure out whether to call online_pages() or
mark_pages_onlined(), which one would they know to call?

Yep. My goal in this was to extract he behaviours I need without affecting any other users. online_pages() is definitely a trivial helper function now, and it could be removed without causing much damage to its couple of callers.

- if (onlined_pages)
+ if (onlined_pages) {
+ struct memory_notify arg;
+
+ arg.start_pfn = pfn; /* ? */
+ arg.nr_pages = onlined_pages;
+ arg.status_change_nid = -1; /* ? */
+
memory_notify(MEM_ONLINE, &arg);
+ }

We should really wrap up memory notify:

static void memory_notify(int state, unsigned long start_pfn,
unsigned long nr_pages, int status_change_nid)
{
struct memory_notify arg;
arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = status_change_nid;
return the_current_memory_notify(state, &arg);
}

We can use that in a couple of spots, right?

Perhaps. Or we could get rid of it altogether. There's only a single user of the notifier (mm/slub.c), and given that it doesn't even use the MEM_ONLINE notifier, we could just drop this part. Seems like it would be simpler to just have the hotplug code call directly into slub if that's what it needs...

My big remaining problem is how to disable the sysfs interface for this memory. I need to prevent any onlining via /sys/device/system/memory.

Looks like I need to modify register_new_memory() to pass a "don't change state" flag, and stash it in struct memory_block so that store_mem_state() knows to ignore any state changes. But it's not clear how I can get that information down into __add_section()... I guess I just need to propagate it down from add_memory().

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Garzik: "[git patches] net driver fixes"
Previous message: Jeff Garzik: "Re: [PATCH 2/2] Blackfin EMAC Driver: delete unused variables tofixup gcc warnings"
In reply to: Dave Hansen: "Re: [PATCH RFC] hotplug-memory: refactor online_pages to separatezone growth from page onlining"
Next in thread: Dave Hansen: "Re: [PATCH RFC] hotplug-memory: refactor online_pages to separatezone growth from page onlining"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]