[PATCH] mm/numa/thp: fix assumptions of migrate_misplaced_transhuge_page()

From: JÃrÃme Glisse
Date: Tue May 03 2016 - 05:53:24 EST


Fix assumptions in migrate_misplaced_transhuge_page() which is only
call by do_huge_pmd_numa_page() itself only call by __handle_mm_fault()
for pmd with PROT_NONE. This means that if the pmd stays the same
then there can be no concurrent get_user_pages / get_user_pages_fast
(GUP/GUP_fast). More over because migrate_misplaced_transhuge_page()
only do something is page is map once then there can be no GUP from
a different process. Finaly, holding the pmd lock assure us that no
other part of the kernel will take an extre reference on the page.

In the end this means that the failure code path should never be
taken unless something is horribly wrong, so convert it to BUG_ON().

Signed-off-by: Jéme Glisse <jglisse@xxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
---
mm/migrate.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 6c822a7..6315aac 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1757,6 +1757,14 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
pmd_t orig_entry;

/*
+ * What we do here is only valid if pmd_protnone(entry) is true and it
+ * is map in only one vma numamigrate_isolate_page() takes care of that
+ * check.
+ */
+ if (!pmd_protnone(entry))
+ goto out_unlock;
+
+ /*
* Rate-limit the amount of data that is being migrated to a node.
* Optimal placement is no good if the memory bus is saturated and
* all the time is being spent migrating!
@@ -1797,7 +1805,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
ptl = pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {
-fail_putback:
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

@@ -1819,7 +1826,12 @@ fail_putback:
goto out_unlock;
}

- orig_entry = *pmd;
+ /*
+ * We are holding the lock so no one can set a new pmd and original pmd
+ * is PROT_NONE thus no one can get_user_pages or get_user_pages_fast
+ * (GUP or GUP_fast) from this point on we can not fail.
+ */
+ orig_entry = entry;
entry = mk_pmd(new_page, vma->vm_page_prot);
entry = pmd_mkhuge(entry);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
@@ -1837,14 +1849,13 @@ fail_putback:
set_pmd_at(mm, mmun_start, pmd, entry);
update_mmu_cache_pmd(vma, address, &entry);

- if (page_count(page) != 2) {
- set_pmd_at(mm, mmun_start, pmd, orig_entry);
- flush_pmd_tlb_range(vma, mmun_start, mmun_end);
- mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);
- update_mmu_cache_pmd(vma, address, &entry);
- page_remove_rmap(new_page, true);
- goto fail_putback;
- }
+ /* As said above no one can get reference on the old page nor through
+ * get_user_pages or get_user_pages_fast (GUP/GUP_fast) or through
+ * any other means. To get reference on huge page you need to hold
+ * pmd_lock and we are already holding that lock here and the page
+ * is only mapped once.
+ */
+ BUG_ON(page_count(page) != 2);

mlock_migrate_page(new_page, page);
page_remove_rmap(page, true);
--
2.1.0


--G4iJoqBmSsgzjUCe--