Re: [PATCH] mm: Add an explicit smp_wmb() to UFFDIO_CONTINUE

From: Peter Xu
Date: Wed Mar 06 2024 - 19:23:41 EST


On Wed, Mar 06, 2024 at 09:57:17AM -0800, James Houghton wrote:
> On Tue, Mar 5, 2024 at 7:41 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> >
> > On Wed, Mar 06, 2024 at 12:15:10AM +0000, James Houghton wrote:
> > > I've tried to see if I can legitimately get a user to read stale data,
> > > and a few attempts with this test[2] have been unsuccessful.
> >
> > AFAICT that won't easily reproduce even if the problem existed, as we
> > contain so many implict memory barriers here and there. E.g. right at the
> > entry of ioctl(), mmget_not_zero() already contains a full ordering
> > constraint:
> >
> > /**
> > * atomic_inc_not_zero() - atomic increment unless zero with full ordering
> > * @v: pointer to atomic_t
>
> Oh yes, of course. Thanks for pointing this out. So we definitely
> don't need a Fixes.
>
> >
> > I was expecting the syscall routine will guarantee an ordering already but
> > indeed I can't find any. I also checked up Intel's spec and SYSCALL inst
> > document only has one paragraph on ordering:
> >
> > Instruction ordering. Instructions following a SYSCALL may be
> > fetched from memory before earlier instructions complete execution,
> > but they will not execute (even speculatively) until all
> > instructions prior to the SYSCALL have completed execution (the
> > later instructions may execute before data stored by the earlier
> > instructions have become globally visible).
> >
> > I guess it implies a hardware reordering is indeed possible in this case?
>
> That's my understanding as well.

Let's also mention that in the commit message when repost? Just in case
it'll answer other readers when they read this patch.

>
> >
> > >
> > > [1]: commit 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
> > > [2]: https://gist.github.com/48ca/38d0665b0f1a6319a56507dc73a173f9
> > >
> > > mm/hugetlb.c | 15 +++++++++------
> > > mm/userfaultfd.c | 18 ++++++++++++++++++
> > > 2 files changed, 27 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index bb17e5c22759..533bf6b2d94d 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -6779,12 +6779,15 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> > > }
> > > }
> > >
> > > - /*
> > > - * The memory barrier inside __folio_mark_uptodate makes sure that
> > > - * preceding stores to the page contents become visible before
> > > - * the set_pte_at() write.
> > > - */
> > > - __folio_mark_uptodate(folio);
> > > + if (!is_continue) {
> > > + /*
> > > + * The memory barrier inside __folio_mark_uptodate makes sure
> > > + * that preceding stores to the page contents become visible
> > > + * before the set_pte_at() write.
> > > + */
> > > + __folio_mark_uptodate(folio);
> >
> > Can we move the comment above the "if", explaining both conditions?
>
> Yes, I'll do that. I think the explanation for
> WARN_ON_ONCE(!folio_test_uptodate(folio)) is:
>
> We only need to `__folio_mark_uptodate(folio)` if we have
> allocated a new folio, and HugeTLB pages will always be Uptodate if
> they are in the pagecache.
>
> We could even drop the WARN_ON_ONCE.

No strong opinions, keeping it still makes sense to me. Btw, it'll also be
important to document "how is the ordering guaranteed for CONTINUE", then
you can reference the new code you add, as readers can get confused on why
CONTINUE doesn't need such ordering.

Thanks,

--
Peter Xu