Re: [PATCH v4 1/2] thp, dax: add thp_get_unmapped_area for pmd mappings

From: Kirill A. Shutemov
Date: Sun Apr 24 2016 - 18:51:19 EST


On Fri, Apr 22, 2016 at 06:21:22PM -0600, Toshi Kani wrote:
> When CONFIG_FS_DAX_PMD is set, DAX supports mmap() using pmd page
> size. This feature relies on both mmap virtual address and FS
> block (i.e. physical address) to be aligned by the pmd page size.
> Users can use mkfs options to specify FS to align block allocations.
> However, aligning mmap address requires code changes to existing
> applications for providing a pmd-aligned address to mmap().
>
> For instance, fio with "ioengine=mmap" performs I/Os with mmap() [1].
> It calls mmap() with a NULL address, which needs to be changed to
> provide a pmd-aligned address for testing with DAX pmd mappings.
> Changing all applications that call mmap() with NULL is undesirable.
>
> Add thp_get_unmapped_area(), which can be called by filesystem's
> get_unmapped_area to align an mmap address by the pmd size for
> a DAX file. It calls the default handler, mm->get_unmapped_area(),
> to find a range and then aligns it for a DAX file.
>
> thp_get_unmapped_area() can be extended for huge page cache support.
>
> The patch is based on Matthew Wilcox's change that allows adding
> support of the pud page size easily.

See Hugh's implementation:

http://lkml.kernel.org/r/alpine.LSU.2.11.1604051420110.5965@xxxxxxxxxxxx

> [1]: https://github.com/axboe/fio/blob/master/engines/mmap.c
> Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> Cc: Jan Kara <jack@xxxxxxx>
> Cc: Theodore Ts'o <tytso@xxxxxxx>
> Cc: Andreas Dilger <adilger.kernel@xxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> ---
> include/linux/huge_mm.h | 7 +++++++
> mm/huge_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 7008623..3769674 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -85,6 +85,10 @@ extern bool is_vma_temporary_stack(struct vm_area_struct *vma);
>
> extern unsigned long transparent_hugepage_flags;
>
> +extern unsigned long thp_get_unmapped_area(struct file *filp,
> + unsigned long addr, unsigned long len, unsigned long pgoff,
> + unsigned long flags);
> +
> extern void prep_transhuge_page(struct page *page);
> extern void free_transhuge_page(struct page *page);
>
> @@ -163,6 +167,9 @@ struct page *get_huge_zero_page(void);
> #define transparent_hugepage_enabled(__vma) 0
>
> #define transparent_hugepage_flags 0UL
> +
> +#define thp_get_unmapped_area NULL
> +
> static inline int
> split_huge_page_to_list(struct page *page, struct list_head *list)
> {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 86f9f8b..2181c7f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -790,6 +790,49 @@ void prep_transhuge_page(struct page *page)
> set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
> }
>
> +unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
> + loff_t off, unsigned long flags, unsigned long size)
> +{
> + unsigned long addr;
> + loff_t off_end = off + len;
> + loff_t off_align = round_up(off, size);
> + unsigned long len_pad;
> +
> + if (off_end <= off_align || (off_end - off_align) < size)
> + return 0;
> +
> + len_pad = len + size;
> + if (len_pad < len || (off + len_pad) < off)
> + return 0;
> +
> + addr = current->mm->get_unmapped_area(filp, 0, len_pad,
> + off >> PAGE_SHIFT, flags);
> + if (IS_ERR_VALUE(addr))
> + return 0;
> +
> + addr += (off - addr) & (size - 1);
> + return addr;

Hugh has more sanity checks before and after call to get_unmapped_area().
Please, consider borrowing them.

> +}
> +
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> + loff_t off = (loff_t)pgoff << PAGE_SHIFT;
> +
> + if (addr)
> + goto out;

I think it's too strong reaction to hint, isn't it?
We definately need this for MAP_FIXED. But in general? Maybe.

> + if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
> + goto out;
> +
> + addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
> + if (addr)
> + return addr;
> +
> + out:
> + return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> +}
> +EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> +
> static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
> struct vm_area_struct *vma,
> unsigned long address, pmd_t *pmd,
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
Kirill A. Shutemov