Re: [PATCH 1/2] mm: gup: Re-pin pages in case of trying several times to migrate

From: Baolin Wang
Date: Thu Oct 20 2022 - 22:52:12 EST




On 10/20/2022 7:43 PM, Alistair Popple wrote:

Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> writes:

On 10/20/2022 4:15 PM, Huang, Ying wrote:
Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> writes:

The migrate_pages() will return the number of {normal page, THP, hugetlb}
that were not migrated, or an error code. That means it can still return
the number of failure count, though the pages have been migrated
successfully with several times re-try.
If my understanding were correct, if pages are migrated successfully
after several times re-tries, the return value will be 0. There's one
possibility for migrate_pages() to return non-zero but all pages are
migrated. That is, when THP is split and all subpages are migrated
successfully.

Yeah, that's the case I tested. Thanks for pointing out. I'll re-write my
incorrect commit message next time.

This is confusing to me. So users of move_page() will see an
unsuccessful migration even when all subpages were migrated? Seems like

Yes.

we should fix the return code of migrate_pages() for this case where all
subpages were successfully migrated.

After more investigation, some other callers will also check the return value to see of all pages are migrated successfully. So yes, I will change the return value in migrate_pages() to fix this issue for all callers like you and Ying suggested. Thanks.

So we should not use the return value of migrate_pages() to determin
if there are pages are failed to migrate. Instead we can validate the
'movable_page_list' to see if there are pages remained in the list,
which are failed to migrate. That can mitigate the failure of longterm
pinning.
Another choice is to use a special return value for split THP + success
migration. But I'm fine to use list_empty(return_pages).

OK. Using list_empty(return_pages) looks more simple.


Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
mm/gup.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 5182aba..bd8cfcd 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1914,9 +1914,10 @@ static int migrate_longterm_unpinnable_pages(
.gfp_mask = GFP_USER | __GFP_NOWARN,
};
- if (migrate_pages(movable_page_list, alloc_migration_target,
- NULL, (unsigned long)&mtc, MIGRATE_SYNC,
- MR_LONGTERM_PIN, NULL)) {
+ ret = migrate_pages(movable_page_list, alloc_migration_target,
+ NULL, (unsigned long)&mtc, MIGRATE_SYNC,
+ MR_LONGTERM_PIN, NULL);
+ if (ret < 0 || !list_empty(movable_page_list)) {
It seems that !list_empty() is sufficient here.

OK. Drop the 'ret < 0'

ret = -ENOMEM;
Why change the error code? I don't think it's a good idea to do that.

The GUP need a -errno for failure or partial success when migration, and we can
not return the number of pages failed to migrate. So returning -ENOMEM seems
suitable for both cases?

Seem reasonable to me. migrate_pages() might return -EAGAIN which would
cause everything to be re-pinned and tried again which is not what you
want here. See the comment at the start of
check_and_migrate_movable_pages().