Re: [PATCH RFC v2 02/12] mm/hugetlb: Move swap entry handling into vma lock for fault

From: Peter Xu
Date: Thu Nov 17 2022 - 20:36:33 EST


On Thu, Nov 17, 2022 at 08:10:15PM -0500, Peter Xu wrote:
> In hugetlb_fault(), there used to have a special path to handle swap entry
> at the entrance using huge_pte_offset(). That's unsafe because
> huge_pte_offset() for a pmd sharable range can access freed pgtables if
> without either the walker lock or vma lock.
>
> Here the simplest solution for making it safe is just to move the swap
> handling to be after the vma lock being held. We may need to take the
> fault mutex on either migration or hwpoison entries now (also the vma lock,
> but that's really needed), however neither of them is hot path so it should
> be fine.
>
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
> mm/hugetlb.c | 24 +++++++-----------------
> 1 file changed, 7 insertions(+), 17 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c3aab6d5b7aa..62ff3fc51d4e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5824,22 +5824,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> int need_wait_lock = 0;
> unsigned long haddr = address & huge_page_mask(h);
>
> - ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> - if (ptep) {
> - /*
> - * Since we hold no locks, ptep could be stale. That is
> - * OK as we are only making decisions based on content and
> - * not actually modifying content here.
> - */
> - entry = huge_ptep_get(ptep);
> - if (unlikely(is_hugetlb_entry_migration(entry))) {
> - migration_entry_wait_huge(vma, ptep);
> - return 0;
> - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> - return VM_FAULT_HWPOISON_LARGE |
> - VM_FAULT_SET_HINDEX(hstate_index(h));
> - }
> -
> /*
> * Serialize hugepage allocation and instantiation, so that we don't
> * get spurious allocation failures if two CPUs race to instantiate
> @@ -5886,8 +5870,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will
> * properly handle it.
> */
> - if (!pte_present(entry))
> + if (!pte_present(entry)) {
> + if (unlikely(is_hugetlb_entry_migration(entry)))
> + migration_entry_wait_huge(vma, ptep);

Hmm no, need to release the vma lock and fault mutex.. So I remembered why
I had a note that I need to rework migration wait code..

I'll try that on next version, it would be a callback just to release the
proper locks in migration_entry_wait_huge() right after releasing the
pgtable lock, in e.g. migration_entry_wait_on_locked().

> + else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> + ret = VM_FAULT_HWPOISON_LARGE |
> + VM_FAULT_SET_HINDEX(hstate_index(h));
> goto out_mutex;
> + }
>
> /*
> * If we are going to COW/unshare the mapping later, we examine the
> --
> 2.37.3
>

--
Peter Xu