Re: [RFC 2.6.11-rc2-mm2 0/7] mm: manual page migration -- overviewII

From: Ray Bryant
Date: Mon Feb 21 2005 - 02:27:55 EST


Andi Kleen wrote:
Do you have any better way to suggest, Andi, for a batch manager to
relocate a job? The typical scenario, as Ray explained it to me, is


- Give the shared libraries and any other files a suitable policy
(by mapping them and applying mbind)

- Then execute migrate_pages() for the anonymous pages with a suitable
old node -> new node mapping.


How would you recommend that the batch manager move that job to the
nodes that can run it? The layout of allocated memory pages and tasks
for that job must be preserved in order to keep the same performance.
The migration method needs to scale to hundreds, or more, of nodes.


You have to walk to full node mapping for each array, but
even with hundreds of nodes that should not be that costly
(in the worst case you could create a small hash table for it
in the kernel, but I'm not sure it's worth it)

-Andi
-

I'm going to assume that there have been some "crossed emails" here.
I don't think that this is the interface that you and I have been
converging on. As I understood it, we were converging on the following:

(1) extended attributes will be used to mark files as non-migratable
(2) the page_migrate() system call will be defined as:

page_migrate(pid, count, old_nodes, new_nodes);

and it will migrate all pages that are either anonymous or part
of mapped files that are not marked non-migratable.
(3) The mbind() system call with MPOL_MF_STRICT will be hooked up
to the migration code so that it actually causes a migration.
Processes can use this interface to migrate a portion of their own
address space containing a mapped file.

This is different than your reply above, which seems to imply that:

(A) Step 1 is to migrate mapped files using mbind(). I don't understand
how to do this in general, because:
(a) I don't know how to make a non-racy list of the mapped files to
migrate without assuming that the process to be migrated is stopped
and (b) If the mapped file is associated with the DEFAULT memory policy,
and page placement was done by first touch, then it is not clear
how to use mbind() to cause the pages to be migrated, and still
end up with the identical topological placement of pages after
the migration.
(B) Step 2 is to use page_migrate() to migrate just the anonymous pages.
I don't like the restriction of this to just anonymous pages.

Fundamentally, I don't see why (A) is much different from allowing one
process to manipulate the physical storage for another process. It's
just stated in terms of mmap'd objects instead of pid's. So I don't
see why that is fundamentally different from a page_migration() call
with va_start and va_end arguments.

So I'm going to assume that the agreement was really (1)-(3) above.

The only problem I see with that is the following: Suppose that a user
wants to migrate a portion of their own address space that is composed
of (at last partly) anonymous pages or pages mapped to a file associated
with the DEFAULT memory policy, and we want the pages to be toplogically
allocated the same way after the migration as they were before the
migration?

The only way I know how to do the latter is with a system call of the form:

page_migrate(pid, va_start, va_end, count, old_nodes, new_nodes);

where the permission model is that a pid can migrate any process that it
can send a signal to. So a root pid can migrate any process, and a user
pid can migrate pages of any pid started by the user.
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@xxxxxxx raybry@xxxxxxxxxxxxx
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/