Re: [PATCH 1/3] mm, numa: rework do_pages_move

From: Anshuman Khandual
Date: Thu Jan 04 2018 - 22:52:54 EST


On 01/03/2018 01:55 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@xxxxxxxx>
>
> do_pages_move is supposed to move user defined memory (an array of
> addresses) to the user defined numa nodes (an array of nodes one for
> each address). The user provided status array then contains resulting
> numa node for each address or an error. The semantic of this function is
> little bit confusing because only some errors are reported back. Notably
> migrate_pages error is only reported via the return value. This patch
> doesn't try to address these semantic nuances but rather change the
> underlying implementation.
>
> Currently we are processing user input (which can be really large)
> in batches which are stored to a temporarily allocated page. Each
> address is resolved to its struct page and stored to page_to_node
> structure along with the requested target numa node. The array of these
> structures is then conveyed down the page migration path via private
> argument. new_page_node then finds the corresponding structure and
> allocates the proper target page.
>
> What is the problem with the current implementation and why to change
> it? Apart from being quite ugly it also doesn't cope with unexpected
> pages showing up on the migration list inside migrate_pages path.
> That doesn't happen currently but the follow up patch would like to
> make the thp migration code more clear and that would need to split a
> THP into the list for some cases.
>
> How does the new implementation work? Well, instead of batching into a
> fixed size array we simply batch all pages that should be migrated to
> the same node and isolate all of them into a linked list which doesn't
> require any additional storage. This should work reasonably well because
> page migration usually migrates larger ranges of memory to a specific
> node. So the common case should work equally well as the current
> implementation. Even if somebody constructs an input where the target
> numa nodes would be interleaved we shouldn't see a large performance
> impact because page migration alone doesn't really benefit from
> batching. mmap_sem batching for the lookup is quite questionable and
> isolate_lru_page which would benefit from batching is not using it even
> in the current implementation.

Hi Michal,

After slightly modifying your test case (like fixing the page size for
powerpc and just doing simple migration from node 0 to 8 instead of the
interleaving), I tried to measure the migration speed with and without
the patches on mainline. Its interesting....

10000 pages | 100000 pages
--------------------------
Mainline 165 ms 1674 ms
Mainline + first patch (move_pages) 191 ms 1952 ms
Mainline + all three patches 146 ms 1469 ms

Though overall it gives performance improvement, some how it slows
down migration after the first patch. Will look into this further.