Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overview

From: Robin Holt
Date: Tue Feb 15 2005 - 07:24:56 EST


On Tue, Feb 15, 2005 at 12:53:03PM +0100, Andi Kleen wrote:
> > (2) You really only want to migrate pages once. If a file is mapped
> > into several of the pid's that are being migrated, then you want
> > to figure this out and issue one call to have it moved wrt one of
> > the pid's.
> > (The page migration code from the memory hotplug patch will handle
> > updating the pte's of the other processs (thank goodness for
> > rmap...))
>
> I don't get this. Surely the migration code will check if a page
> is already in the target node, and when that is the case do nothing.
>
> How could this "double migration" happen?

A node is not always equal distant to a cpu. We need to keep node-to-cpu
distant relatively constant between the original and final placement.
There may be a time where you are moving stuff from node 8 to node 4
and stuff from node 12 to node 8. If you scan the vmas for both the
processes in the wrong order you will migrate memory from node 12 to 8
for the second process and then from node 8 to node 4 for the second.

> > (3) In the case where a particular file is mapped into different
> > processes at different file offsets (and we are migrating both
> > of the processes), one has to examine the file offsets to figure
> > out if the mappings overlap or not. If they overlap, then you've
> > got to issue two calls, each of which describes a non-overlapping
> > region; both calls taken together would cover the entire range
> > of pages mapped to the file. Similarly if the ranges do not
> > overlap.
>
> That sounds like a quite obscure corner case which I'm not sure
> is worth all the complexity.

So obscure that nearly every example batch job we looked at had exactly
this circumstance. Turns out that quite a few batch jobs we looked at
have a parent that maps their working set initially. After the workers
are forked, they map some part of the same data file to different parts
of their own address space. They also commonly map over the top of the
large file mapping that was originally done leaving us with a jumble of
address space. This really showed the need for a user-space application
to figure the problem out and allow the flexibility to come up with more
advanced migration algorithms.

Thanks,
Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/