Re: [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code

From: Yang Shi
Date: Tue Jul 20 2021 - 16:58:06 EST


On Mon, Jul 19, 2021 at 11:56 PM Huang Ying <ying.huang@xxxxxxxxx> wrote:
>
> Before the commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault
> handling"), the TLB flushing is done in do_huge_pmd_numa_page() itself
> via flush_tlb_range().
>
> But after commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault
> handling"), the TLB flushing is done in migrate_pages() as in the
> following code path anyway.
>
> do_huge_pmd_numa_page
> migrate_misplaced_page
> migrate_pages
>
> So now, the TLB flushing code in do_huge_pmd_numa_page() becomes
> unnecessary. So the code is deleted in this patch to simplify the
> code. This is only code cleanup, there's no visible performance
> difference.

Yes, there is tlb flush in try_to_migrate(), but it seems mmu notifier
invalidate is missed for the THP migration case. I'm not quite sure
why it is not needed, maybe just missed?

So, you may need the below change too:

diff --git a/mm/rmap.c b/mm/rmap.c
index 2d29a57d29e8..e1c8b654563d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1749,6 +1749,8 @@ static bool try_to_migrate_one(struct page
*page, struct vm_area_struct *vma,
!PageTransCompound(page), page);

set_pmd_migration_entry(&pvmw, page);
+ mmu_notifier_invalidate_range(mm, range.start,
+ range.end);
continue;
}
#endif

>
> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
> Cc: Yang Shi <shy828301@xxxxxxxxx>
> Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxx>
> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxx>
> Cc: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx>
> Cc: Heiko Carstens <hca@xxxxxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Vasily Gorbik <gor@xxxxxxxxxxxxx>
> Cc: Zi Yan <ziy@xxxxxxxxxx>
> ---
> mm/huge_memory.c | 26 --------------------------
> 1 file changed, 26 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index afff3ac87067..9f21e44c9030 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1440,32 +1440,6 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
> goto out;
> }
>
> - /*
> - * Since we took the NUMA fault, we must have observed the !accessible
> - * bit. Make sure all other CPUs agree with that, to avoid them
> - * modifying the page we're about to migrate.
> - *
> - * Must be done under PTL such that we'll observe the relevant
> - * inc_tlb_flush_pending().
> - *
> - * We are not sure a pending tlb flush here is for a huge page
> - * mapping or not. Hence use the tlb range variant
> - */
> - if (mm_tlb_flush_pending(vma->vm_mm)) {
> - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
> - /*
> - * change_huge_pmd() released the pmd lock before
> - * invalidating the secondary MMUs sharing the primary
> - * MMU pagetables (with ->invalidate_range()). The
> - * mmu_notifier_invalidate_range_end() (which
> - * internally calls ->invalidate_range()) in
> - * change_pmd_range() will run after us, so we can't
> - * rely on it here and we need an explicit invalidate.
> - */
> - mmu_notifier_invalidate_range(vma->vm_mm, haddr,
> - haddr + HPAGE_PMD_SIZE);
> - }
> -
> pmd = pmd_modify(oldpmd, vma->vm_page_prot);
> page = vm_normal_page_pmd(vma, haddr, pmd);
> if (!page)
> --
> 2.30.2
>