Re: [PATCH] mm/hugeltb: fix nodes huge page allocation when there are surplus pages

From: Mike Kravetz
Date: Mon Aug 28 2023 - 19:38:18 EST


On 08/26/23 11:57, Xueshi Hu wrote:
> In set_nr_huge_pages(), local variable "count" is used to record
> persistent_huge_pages(), but when it cames to nodes huge page allocation,
> the semantics changes to nr_huge_pages. When there exists surplus huge
> pages and using the interface under
> /sys/devices/system/node/node*/hugepages to change huge page pool size,
> this difference can result in the allocation of an unexpected number of
> huge pages.
>
> Steps to reproduce the bug:
>
> Starting with:
>
> Node 0 Node 1 Total
> HugePages_Total 0.00 0.00 0.00
> HugePages_Free 0.00 0.00 0.00
> HugePages_Surp 0.00 0.00 0.00
>
> create 100 huge pages in Node 0 and consume it, then set Node 0 's
> nr_hugepages to 0.
>
> yields:
>
> Node 0 Node 1 Total
> HugePages_Total 200.00 0.00 200.00
> HugePages_Free 0.00 0.00 0.00
> HugePages_Surp 200.00 0.00 200.00
>
> write 100 to Node 1's nr_hugepages
>
> echo 100 > /sys/devices/system/node/node1/\
> hugepages/hugepages-2048kB/nr_hugepages
>
> gets:
>
> Node 0 Node 1 Total
> HugePages_Total 200.00 400.00 600.00
> HugePages_Free 0.00 400.00 400.00
> HugePages_Surp 200.00 0.00 200.00
>
> Kernel is expected to create only 100 huge pages and it gives 200.
>
> Fixes: fd875dca7c71 ("hugetlbfs: fix potential over/underflow setting node specific nr_hugepages")
> Signed-off-by: Xueshi Hu <xueshi.hu@xxxxxxxxxx>
> ---
> mm/hugetlb.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)

Thank you!

Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>

A note about the Fixes: tag. Commit fd875dca7c71 moved the root cause of this
issue (the line changed below) from the routine __nr_hugepages_store_common
to the routine set_max_huge_pages. I believe the actual issue has existed
since commit 9a30523066cde that added hugetlb node specific support way back
in 2009 (2.6.32 timeframe).

If 9a30523066cde is used in the fixes tag, there will be some non-automatic
backports for releases where fd875dca7c71 does not exist. I would suggest
changing the fixes tag. We can ignore the broken backports for older releases,
as I really doubt this is a real issue for many/any people.
--
Mike Kravetz

>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6da626bfb52e..54e2e2e12aa9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3494,7 +3494,9 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> if (nid != NUMA_NO_NODE) {
> unsigned long old_count = count;
>
> - count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
> + count += persistent_huge_pages(h) -
> + (h->nr_huge_pages_node[nid] -
> + h->surplus_huge_pages_node[nid]);
> /*
> * User may have specified a large count value which caused the
> * above calculation to overflow. In this case, they wanted
> --
> 2.40.1
>