Re: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT

From: Mel Gorman
Date: Wed Aug 02 2023 - 11:30:43 EST

On Tue, Aug 01, 2023 at 02:48:40PM +0200, David Hildenbrand wrote:
> Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from
> gup/gup_fast") from 2012 documented as the primary reason why we would want
> to handle NUMA hinting faults from GUP:
> KVM secondary MMU page faults will trigger the NUMA hinting page
> faults through gup_fast -> get_user_pages -> follow_page ->
> handle_mm_fault.
> That is still the case today, and relevant KVM code has been converted to
> manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting
> FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that
> many other ones that really require such handling for autonuma remain.
> Possible interaction with MMU notifiers:
> Assume a driver obtains a page using get_user_pages() to map it into
> a secondary MMU, and uses the MMU notifier framework to get notified on
> changes.
> Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because
> FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is
> mapped into a secondary MMU. Once user space would turn that mapping
> inaccessible using mprotect(PROT_NONE), the actual PTE in the page table
> might not change. If the MMU notifier would be smart and optimize for that
> case "why notify if the PTE didn't change", that could be problematic.
> At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an
> unconditional mmu_notifier_invalidate_range_start() ->
> mmu_notifier_invalidate_range_end() and should be fine.
> Note that even if a PTE in an accessible VMA is pte_protnone(), the
> underlying page might be accessed by a secondary MMU that does not set
> FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true".
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>

Also seems sane but a large portion of its correctness also depends on
patch 3 being correct.

Mel Gorman