Re: RFC for new feature to move pages from one vma to another without split

From: Peter Xu
Date: Tue May 16 2023 - 12:44:34 EST


On Mon, May 08, 2023 at 03:56:50PM -0700, Lokesh Gidra wrote:
> On Tue, Apr 11, 2023 at 8:14 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
> >
> > On Mon, Apr 10, 2023 at 12:41:31AM -0700, Lokesh Gidra wrote:
> > > On Thu, Apr 6, 2023 at 10:29 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > >
> > > > Hi, Lokesh,
> > > >
> > > > Sorry for a late reply. Copy Blake Caldwell and Mike too.
> > >
> > > Thanks for the reply. It's extremely helpful.
> > > >
> > > > On Thu, Feb 16, 2023 at 02:27:11PM -0800, Lokesh Gidra wrote:
> > > > > I) SUMMARY:
> > > > > Requesting comments on a new feature which remaps pages from one
> > > > > private anonymous mapping to another, without altering the vmas
> > > > > involved. Two alternatives exist but both have drawbacks:
> > > > > 1. userfaultfd ioctls allocate new pages, copy data and free the old
> > > > > ones even when updates could be done in-place;
> > > > > 2. mremap results in vma splitting in most of the cases due to 'pgoff' mismatch.
> > > >
> > > > Personally it was always a mistery to me on how vm_pgoff works with
> > > > anonymous vmas and why it needs to be setup with vm_start >> PAGE_SHIFT.
> > > >
> > > > Just now I tried to apply below oneliner change:
> > > >
> > > > @@ -1369,7 +1369,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > > > /*
> > > > * Set pgoff according to addr for anon_vma.
> > > > */
> > > > - pgoff = addr >> PAGE_SHIFT;
> > > > + pgoff = 0;
> > > > break;
> > > > default:
> > > > return -EINVAL;
> > > >
> > > > The kernel even boots without a major problem so far..
> > > >
> > > > I had a feeling that I miss something else here, it'll be great if anyone
> > > > knows.
> > > >
> > > > Anyway, I agree mremap() is definitely not the best way to do page level
> > > > operations like this, no matter whether vm_pgoff can match or not.
> > > >
> > > > >
> > > > > Proposing a new mremap flag or userfaultfd ioctl which enables
> > > > > remapping pages without these drawbacks. Such a feature, as described
> > > > > below, would be very helpful in efficient implementation of concurrent
> > > > > compaction algorithms.
> > > >
> > > > After I read the proposal, I had a feeling that you're not aware that we
> > > > have similar proposals adding UFFDIO_REMAP.
> > >
> > > Yes, I wasn't aware of this. Thanks a lot for sharing the details.
> > > >
> > > > I think it started with Andrea's initial proposal on the whole uffd:
> > > >
> > > > https://lore.kernel.org/linux-mm/1425575884-2574-1-git-send-email-aarcange@xxxxxxxxxx/
> > > >
> > > > Then for some reason it's not merged in initial version, but at least it's
> > > > been proposed again here (even though it seems the goal is slightly
> > > > different; that may want to move page out instead of moving in):
> > > >
> > > > https://lore.kernel.org/linux-mm/cover.1547251023.git.blake.caldwell@xxxxxxxxxxxx/
> > >
> > > Yeah, this seems to be the opposite of what I'm looking for. IIUC,
> > > page out REMAP can't
> > > satisfy any MISSING userfault. In fact, it enables MISSING faults in
> > > future. Maybe a flag
> > > can be added to uffdio_remap struct to accommodate this case, if it is
> > > still being pursued.
> >
> > Yes, I don't think that's a major problem if the use cases share mostly the
> > same fundation.
> >
> > > >
> > > > Also worth checking with the latest commit that Andrea maintains himself (I
> > > > doubt whether there's major changes, but still just to make it complete):
> > > >
> > > > https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc19e92
> > > >
> > > > So far I think that's what you're looking for. I'm not sure whether the
> > > > limitations will be a problem, though, at least mentioned in the old
> > > > proposals of UFFDIO_REMAP. For example, it required not only anonymous but
> > > > also mapcount==1 on all src pages. But maybe that's not a problem here
> > > > too.
> > >
> > > Yes, this is exactly what I am looking for. The mapcount==1 is not a
> > > problem either. Any idea why the patch isn't merged?
> >
> > The initial verion of discussion mentioned some of the reason of lacking
> > use cases:
> >
> > https://lore.kernel.org/linux-mm/20150305185112.GL4280@xxxxxxxxxx/
> >
> Thanks for sharing the link. I assume the 20% performance gap in
> UFFDIO_COPY vs UFFDIO_REMAP is
> just for ioctl calls. But (at least) in case of compaction (our use
> case), COPY increases other overheads.

Per my read:

Yes, we already measured the UFFDIO_COPY is faster than
UFFDIO_REMAP, the userfault latency decreases -20%.

It was the fault latency so it can be more than the pure ioctl
measurements. However I think the point is valid that for this specific
use case it's not purely adding memory but also including removals. It
seems indeed a proper use case to me at least for what I can see now.

> It leads to more page allocations, mem-copies, and madvises than
> required. OTOH, with REMAP:
>
> 1) Page allocations can be mostly avoided by recycling the pages as
> they are freed during compaction
> 2) Memcpy (for compacting objects) into the page (from (1)) is needed
> only once (as compared to COPY wherein it does another memcpy).
> Furthermore, as described in the RFC, sometimes even 1 memcpy isn't
> required (with REMAP)
> 3) As pages are being recycled in userspace, there would be far fewer
> pages to madvise at the end of compaction.
>
> Also, as described in the RFC, REMAP allows moving pages within heap
> for page-level coarse-grained compaction, which helps by avoiding
> swapping in the page. This wouldn't be possible with COPY.

Please feel free to pick up the work if you think that's the right one for
you. IMHO it'll be very helpful if you can justify how REMAP could improve
the use case in the cover letter with some real numbers.

Thanks,

--
Peter Xu