Re: page migration patchset

From: Steve Longerbeam
Date: Tue Jan 11 2005 - 14:07:29 EST


Ray Bryant wrote:

Andi and Steve,

Steve Longerbeam wrote:
<snip>


My personal preference would be to keep as much of this as possible
under user space control; that is, rather than having a big autonomous
system call that migrates pages and then updates policy information,
I'd prefer to split the work into several smaller system calls that
are issued by a user space program responsible for coordinating the
process migration as a series of steps, e. g.:

(1) suspend the process via SIGSTOP
(2) update the mempolicy information
(3) migrate the process's pages
(4) migrate the process to the new cpu via set_schedaffinity()
(5) resume the process via SIGCONT


steps 2 and 3 can be accomplished by a call to mbind() and
specifying MPOL_MF_MOVE. And since mbind() takes an
address range, you could probably migrate pages and change
the policies for all of the process' mappings in a single mbind()
call.


OK, I just got around to looking into this suggestion. Unfortunately,
it doesn't look as if this will do what I want. I need to be able to
conserve the topology of the application when it is migrated (required
to give the application the same performance in its new location that
it got in its old location).


I see what you mean, unless the requested address range exactly
fits within an existing vma, existing vma's will get split up.

So, I need to be able to say "take the
pages on this node and move them to that node". The sys_mbind() call
doesn't have the necessry arguments to do this. I'm thinking of
something like:

migrate_process_pages(pid, numnodes, oldnodelist, newnodelist);

This would scan the address space of process pid, and each page that
is found on oldnodelist[i] would be moved to node newnodelist[i].


right, that's something I'd be interested in as well. In fact, an address
range is not ideal for me either - what I really need is an API that
allows me to specify a single existing vma (or all the process'
regions in your case) that is to have its policy changed and resident
pages migrated, without changing the topology (eg. split vma's).


Pages that are found to be swapped out would be handled as follows:
Add the original node id to either the swap pte or the swp_entry_t.
Swap in will be modified to allocate the page on the same node it
came from. Then, as part of migrate_process_pages, all that would
be done for swapped out pages would be to change the "original node"
field to point at the new node.


isn't this already taken care of? read_swap_cache_async() is given
a vma, and passes it to alloc_page_vma(). So if you have earlier
changed the policy for that vma, the new policy will be used
when allocating the page during the swap in.

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/