Re: [PATCH 11/13] mm/munlock: page migration needs mlock pagevec drained

From: Vlastimil Babka
Date: Fri Feb 11 2022 - 13:49:18 EST


On 2/6/22 22:49, Hugh Dickins wrote:
> Page migration of a VM_LOCKED page tends to fail, because when the old
> page is unmapped, it is put on the mlock pagevec with raised refcount,
> which then fails the freeze.
>
> At first I thought this would be fixed by a local mlock_page_drain() at
> the upper rmap_walk() level - which would have nicely batched all the
> munlocks of that page; but tests show that the task can too easily move
> to another cpu, leaving pagevec residue behind which fails the migration.
>
> So try_to_migrate_one() drain the local pagevec after page_remove_rmap()
> from a VM_LOCKED vma; and do the same in try_to_unmap_one(), whose
> TTU_IGNORE_MLOCK users would want the same treatment; and do the same
> in remove_migration_pte() - not important when successfully inserting
> a new page, but necessary when hoping to retry after failure.
>
> Any new pagevec runs the risk of adding a new way of stranding, and we
> might discover other corners where mlock_page_drain() or lru_add_drain()
> would now help. If the mlock pagevec raises doubts, we can easily add a
> sysctl to tune its length to 1, which reverts to synchronous operation.

Not a fan of adding new sysctls like those as that just pushes the failure
of kernel devs to poor admins :)
The old pagevec usage deleted by patch 1 was limited to the naturally larger
munlock_vma_pages_range() operation. The new per-cpu based one is more
general, which obviously has its advantages, but then it might bring new
corner cases.
So if this turns out to be an big problem, I would rather go back to the
limited scenario pagevec than a sysctl?

> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>

Acked-by: Vlastimil Babka <vbabka@xxxxxxx>

> ---
> mm/migrate.c | 2 ++
> mm/rmap.c | 4 ++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f4bcf1541b62..e7d0b68d5dcb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -251,6 +251,8 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
> page_add_file_rmap(new, vma, false);
> set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
> }
> + if (vma->vm_flags & VM_LOCKED)
> + mlock_page_drain(smp_processor_id());
>
> /* No need to invalidate - it was non-present before */
> update_mmu_cache(vma, pvmw.address, pvmw.pte);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 5442a5c97a85..714bfdc72c7b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1656,6 +1656,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> * See Documentation/vm/mmu_notifier.rst
> */
> page_remove_rmap(subpage, vma, PageHuge(page));
> + if (vma->vm_flags & VM_LOCKED)
> + mlock_page_drain(smp_processor_id());
> put_page(page);
> }
>
> @@ -1930,6 +1932,8 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
> * See Documentation/vm/mmu_notifier.rst
> */
> page_remove_rmap(subpage, vma, PageHuge(page));
> + if (vma->vm_flags & VM_LOCKED)
> + mlock_page_drain(smp_processor_id());
> put_page(page);
> }
>