Re: [RFC -mm] mm, userfaultfd, THP: Avoid waiting when PMD under THP migration

From: Andrea Arcangeli
Date: Mon Nov 06 2017 - 15:35:37 EST


On Mon, Nov 06, 2017 at 10:53:48AM -0500, Zi Yan wrote:
> Thanks for clarifying it. We both agree that !pmd_present(), which means
> PMD migration entry, does not get into userfaultfd_must_wait(),
> then there seems to be no issue with current code yet.
>
> However, the if (!pmd_present(_pmd)) in userfaultfd_must_wait() does not
> match
> the exact condition. How about the patch below? It can catch pmd
> migration entries,
> which are only possible in x86_64 at the moment.
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 1c713fd5b3e6..dda25444a6ee 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -294,9 +294,11 @@ static inline bool userfaultfd_must_wait(struct
> userfaultfd_ctx *ctx,
> * pmd_trans_unstable) of the pmd.
> */
> _pmd = READ_ONCE(*pmd);
> - if (!pmd_present(_pmd))
> + if (pmd_none(_pmd))
> goto out;
>
> + VM_BUG_ON(thp_migration_supported() && is_pmd_migration_entry(_pmd));
> +

As I wrote in prev email I'm not sure about this invariant to be
correct 100% of the time (plus we'd want a VM_WARN_ON only
here). Specifically, what does prevent try_to_unmap to run on a THP
backed mapping with only the mmap_sem for reading?

I know what prevents to ever reproduce this in practice though (aside
from the fact the race between the is_swap_pmd() check in the main
page fault and the above check is small) and it's because compaction
won't migrate THP and even the numa faults will not use the migration
entry. So it'd require some more explicit migration numactl while
userfaults are running to ever see an hang in there.

I think it's a regression since the introduction of THP migration
around commits 84c3fc4e9c563d8fb91cfdf5948da48fe1af34d3 /
616b8371539a6c487404c3b8fb04078016dab4ba /
9c670ea37947a82cb6d4df69139f7e46ed71a0ac etc.. before that pmd_none or
!pmd_present used to be equivalent, not the case any longer. Of course
pmd_none would have been better before too.

Thanks,
Andrea