Re: [PATCH v3] mm: add mremap flag for preserving the old mapping

From: Andy Lutomirski
Date: Tue Sep 30 2014 - 13:50:06 EST


On Sep 30, 2014 2:36 AM, "Daniel Micay" <danielmicay@xxxxxxxxx> wrote:
>
> On 30/09/14 01:53 AM, Andy Lutomirski wrote:
> > On Mon, Sep 29, 2014 at 9:55 PM, Daniel Micay <danielmicay@xxxxxxxxx> wrote:
> >> This introduces the MREMAP_RETAIN flag for preserving the source mapping
> >> when MREMAP_MAYMOVE moves the pages to a new destination. Accesses to
> >> the source location will fault and cause fresh pages to be mapped in.
> >>
> >> For consistency, the old_len >= new_len case could decommit the pages
> >> instead of unmapping. However, userspace can accomplish the same thing
> >> via madvise and a coherent definition of the flag is possible without
> >> the extra complexity.
> >
> > IMO this needs very clear documentation of exactly what it does.
>
> Agreed, and thanks for the review. I'll post a slightly modified version
> of the patch soon (mostly more commit message changes).
>
> > Does it preserve the contents of the source pages? (If so, why?
> > Aren't you wasting a bunch of time on page faults and possibly
> > unnecessary COWs?)
>
> The source will act as if it was just created. For an anonymous memory
> mapping, it will fault on any accesses and bring in new zeroed pages.
>
> In jemalloc, it replaces an enormous memset(dst, src, size) followed by
> madvise(src, size, MADV_DONTNEED) with mremap. Using mremap also ends up
> eliding page faults from writes at the destination.
>
> TCMalloc has nearly the same page allocation design, although it tries
> to throttle the purging so it won't always gain as much.
>
> > Does it work on file mappings? Can it extend file mappings while it moves them?
>
> It works on file mappings. If a move occurs, there will be the usual
> extended destination mapping but with the source mapping left intact.
>
> It wouldn't be useful with existing allocators, but in theory a general
> purpose allocator could expose an MMIO API in order to reuse the same
> address space via MAP_FIXED/MREMAP_FIXED to reduce VM fragmentation.
>
> > If you MREMAP_RETAIN a partially COWed private mapping, what happens?
>
> The original mapping is zeroed in the following test, as it would be
> without fork:
>
> #define _GNU_SOURCE
>
> #include <string.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> #include <sys/wait.h>
>
> int main(void) {
> size_t size = 1024 * 1024;
> char *orig = mmap(NULL, size, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> memset(orig, 5, size);
> int pid = fork();
> if (pid == -1)
> return 1;
> if (pid == 0) {
> memset(orig, 5, 1024);
> char *new = mremap(orig, size, size * 128, MREMAP_MAYMOVE|4);
> if (new == orig) return 1;
> for (size_t i = 0; i < size; i++)
> if (new[i] != 5)
> return 1;
> for (size_t i = 0; i < size; i++)
> if (orig[i] != 0)
> return 1;
> return 0;
> }
> int status;
> if (wait(&status) < -1) return 1;
> if (WIFEXITED(status))
> return WEXITSTATUS(status);
> return 1;
> }
>
> Hopefully this is the case you're referring to. :)

What about private file mappings?

>
> > Does it work on special mappings? If so, please prevent it from doing
> > so. mremapping x86's vdso is a thing, and duplicating x86's vdso
> > should not become a thing, because x86_32 in particular will become
> > extremely confused.
>
> I'll add a check for arch_vma_name(vma) == NULL.

Careful! That function is deprecated in favor of vm_ops->name.

I think it might pay to add an explicit vm_op to authorize
duplication, especially for non-cow mappings. IOW this kind of
extension seems quite magical for anything that doesn't have the
normal COW semantics, including for plain old read-only mappings.

>
> There's an existing check for VM_DONTEXPAND | VM_PFNMAP when expanding
> allocations (the only case this flag impacts). Are there other kinds of
> special mappings that you're referring to?

I was referring to special mappings in the install_special_mapping
sense. Those may or may not have VM_PFNMAP set.

If VM_DONTEXPAND blocks this new feature entirely, that's probably good.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/