Re: [PATCH] x86 get_unmapped_area: Add PMD alignment for DAX PMD mmap

From: Toshi Kani
Date: Wed Apr 06 2016 - 13:53:02 EST


On Wed, 2016-04-06 at 12:50 -0400, Matthew Wilcox wrote:
> On Wed, Apr 06, 2016 at 07:58:09AM -0600, Toshi Kani wrote:
> >
> > When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using PMD page
> > size.ÂÂThis feature relies on both mmap virtual address and FS
> > block data (i.e. physical address) to be aligned by the PMD page
> > size.ÂÂUsers can use mkfs options to specify FS to align block
> > allocations.ÂÂHowever, aligning mmap() address requires application
> > changes to mmap() calls, such as:
> >
> > Â-ÂÂ/* let the kernel to assign a mmap addr */
> > Â-ÂÂmptr = mmap(NULL, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0);
> >
> > Â+ÂÂ/* 1. obtain a PMD-aligned virtual address */
> > Â+ÂÂret = posix_memalign(&mptr, PMD_SIZE, fsize);
> > Â+ÂÂif (!ret)
> > Â+ÂÂÂÂfree(mptr);ÂÂ/* 2. release the virt addr */
> > Â+
> > Â+ÂÂ/* 3. then pass the PMD-aligned virt addr to mmap() */
> > Â+ÂÂmptr = mmap(mptr, fsize, PROT_READ|PROT_WRITE, FLAGS, fd, 0);
> >
> > These changes add unnecessary dependency to DAX and PMD page size
> > into application code.ÂÂThe kernel should assign a mmap address
> > appropriate for the operation.
>
> I question the need for this patch.ÂÂChoosing an appropriate base address
> is the least of the changes needed for an application to take advantage
> of DAX.ÂÂ

An application also needs to make sure that a given range [base -
base+size] is free in VMA. ÂThe above example uses posix_memalign() to find
such a range, which in turn calls mmap() with size as (fsize + PMD_SIZE) in
this case.

> The NVML chooses appropriate addresses and gets a properly aligned
> address without any kernel code.

An application like NVML can continue to specify a specific address to
mmap(). ÂMost existing applications, however, do not specify an address to
mmap(). ÂWith this patch, specifying an address will remain optional.

> > Change arch_get_unmapped_area() and arch_get_unmapped_area_topdown()
> > to request PMD_SIZE alignment when the request is for a DAX file and
> > its mapping range is large enough for using a PMD page.
>
> I think this is the wrong place for it, if we decide that this is the
> right thing to do.ÂÂThe filesystem has a get_unmapped_area() which
> should be used instead.

Yes, I considered adding a filesystem entry point, but decided going this
way because:
Â-Âarch_get_unmapped_area() andÂarch_get_unmapped_area_topdown() are arch-
specific code. ÂTherefore, this filesystem entry point will need arch-
specific implementation.Â
Â- There is nothing filesystem specific about requesting PMD alignment.

> >
> > @@ -157,6 +157,13 @@ arch_get_unmapped_area(struct file *filp, unsigned
> > long addr,
> > Â info.align_mask = get_align_mask();
> > Â info.align_offset += get_align_bits();
> > Â }
> > + if (filp && IS_ENABLED(CONFIG_FS_DAX_PMD) &&
> > IS_DAX(file_inode(filp))) {
>
> And there's never a need for the IS_ENABLED.ÂÂIS_DAX() compiles to '0' if
> CONFIG_FS_DAX is disabled.

CONFIG_FS_DAX_PMD can be disabled while CONFIG_FS_DAX is enabled.

> And where would this end?ÂÂWould you also change this code to look for
> 1GB entries if CONFIG_FS_DAX_PUD is enabled?ÂÂFar better to have this
> in the individual filesystem (probably calling a common helper in the DAX
> code).

Yes, it can be easily extended to support PUD. ÂThis avoids another round
of application changes to align with the PUD size.

If the PUD support turns out to be filesystem specific, we may need a
capability bit in addition to CONFIG_FS_DAX_PUD.

Thanks,
-Toshi