Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements

From: Naoya Horiguchi
Date: Fri Dec 22 2017 - 04:00:12 EST


On 12/20/2017 06:53 PM, Michal Hocko wrote:
> On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote:
>>
>> On 12/15/2017 06:33 PM, Michal Hocko wrote:
>>> Naoya,
>>> this has passed Mike's review (thanks for that!), you have mentioned
>>> that you can pass this through your testing machinery earlier. While
>>> I've done some testing already I would really appreciate if you could
>>> do that as well. Review would be highly appreciated as well.
>>
>> Sorry for my slow response. I reviewed/tested this patchset and looks
>> good to me overall.
>
> No need to feel sorry. This doesn't have an urgent priority. Thanks for
> the review and testing. Can I assume your {Reviewed,Acked}-by or
> Tested-by?
>

Yes, I tested again with additional changes below, and hugetlb migration
works fine from mbind(2). Thank you very much for your work.

Reviewed-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>

for the series.

Thanks,
Naoya Horiguchi

>> I have one comment on the code path from mbind(2).
>> The callback passed to migrate_pages() in do_mbind() (i.e. new_page())
>> calls alloc_huge_page_noerr() which currently doesn't call SetPageHugeTemporary(),
>> so hugetlb migration fails when h->surplus_huge_page >= h->nr_overcommit_huge_pages.
>
> Yes, I am aware of that. I should have been more explicit in the
> changelog. Sorry about that and thanks for pointing it out explicitly.
> To be honest I wasn't really sure what to do about this. The code path
> is really complex and it made my head spin. I fail to see why we have to
> call alloc_huge_page and mess with reservations at all.
>
>> I don't think this is a bug, but it would be better if mbind(2) works
>> more similarly with other migration callers like move_pages(2)/migrate_pages(2).
>
> If the fix is as easy as the following I will add it to the pile.
> Otherwise I would prefer to do this separately after I find some more
> time to understand the callpath.
> ---
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e035002d3fb6..08a4af411e25 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -345,10 +345,9 @@ struct huge_bootmem_page {
> struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr, int avoid_reserve);
> struct page *alloc_huge_page_node(struct hstate *h, int nid);
> -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
> - unsigned long addr, int avoid_reserve);
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> nodemask_t *nmask);
> +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address);
> int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
> pgoff_t idx);
>
> @@ -526,7 +525,7 @@ struct hstate {};
> #define alloc_huge_page(v, a, r) NULL
> #define alloc_huge_page_node(h, nid) NULL
> #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL
> -#define alloc_huge_page_noerr(v, a, r) NULL
> +#define alloc_huge_page_vma(vma, address) NULL
> #define alloc_bootmem_huge_page(h) NULL
> #define hstate_file(f) NULL
> #define hstate_sizelog(s) NULL
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4426c5b23a20..e00deabe6d17 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1672,6 +1672,25 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
> }
>
> +/* mempolicy aware migration callback */
> +struct page *alloc_huge_page_vma(struct vm_area_struct *vma, unsigned long address)
> +{
> + struct mempolicy *mpol;
> + nodemask_t *nodemask;
> + struct page *page;
> + struct hstate *h;
> + gfp_t gfp_mask;
> + int node;
> +
> + h = hstate_vma(vma);
> + gfp_mask = htlb_alloc_mask(h);
> + node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
> + page = alloc_huge_page_nodemask(h, node, nodemask);
> + mpol_cond_put(mpol);
> +
> + return page;
> +}
> +
> /*
> * Increase the hugetlb pool such that it can accommodate a reservation
> * of size 'delta'.
> @@ -2077,20 +2096,6 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
> return ERR_PTR(-ENOSPC);
> }
>
> -/*
> - * alloc_huge_page()'s wrapper which simply returns the page if allocation
> - * succeeds, otherwise NULL. This function is called from new_vma_page(),
> - * where no ERR_VALUE is expected to be returned.
> - */
> -struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
> - unsigned long addr, int avoid_reserve)
> -{
> - struct page *page = alloc_huge_page(vma, addr, avoid_reserve);
> - if (IS_ERR(page))
> - page = NULL;
> - return page;
> -}
> -
> int alloc_bootmem_huge_page(struct hstate *h)
> __attribute__ ((weak, alias("__alloc_bootmem_huge_page")));
> int __alloc_bootmem_huge_page(struct hstate *h)
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index f604b22ebb65..96823fa07f38 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1121,8 +1121,7 @@ static struct page *new_page(struct page *page, unsigned long start, int **x)
> }
>
> if (PageHuge(page)) {
> - BUG_ON(!vma);
> - return alloc_huge_page_noerr(vma, address, 1);
> + return alloc_huge_page_vma(vma, address);
> } else if (thp_migration_supported() && PageTransHuge(page)) {
> struct page *thp;
>
>