Re: cgroup and FALLOC_FL_PUNCH_HOLE: WARNING: CPU: 13 PID: 2438 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x5

From: David Hildenbrand
Date: Wed Oct 21 2020 - 08:42:17 EST


On 21.10.20 05:35, Mike Kravetz wrote:
> On 10/20/20 6:38 AM, David Hildenbrand wrote:
>>
>> I'm bisecting the warning right now. Looks like it was introduced in v5.7.
>

So bisecting nailed it down to one of

353b2de42e84 mm/hugetlb.c: clean code by removing unnecessary initialization
a9b3f867404b hugetlb: support file_region coalescing again
08cf9faf7558 hugetlb_cgroup: support noreserve mappings
075a61d07a8e hugetlb_cgroup: add accounting for shared mappings
0db9d74ed884 hugetlb: disable region_add file_region coalescing
e9fe92ae0cd2 hugetlb_cgroup: add reservation accounting for private mappings
9808895e1a44 mm/hugetlb_cgroup: fix hugetlb_cgroup migration
1adc4d419aa2 hugetlb_cgroup: add interface for charge/uncharge hugetlb
reservations
cdc2fcfea79b hugetlb_cgroup: add hugetlb_cgroup reservation counter

So seems to be broken right from the beginning of
charge/uncharge/reservations. Not a surpise when looking at your fixes :)


> I found the following bugs in the cgroup reservation accounting. The ones
> in region_del are pretty obvious as the number of pages to uncharge would
> always be zero. The one on alloc_huge_page needs racing code to expose.
>
> With these fixes, my testing is showing consistent/correct results for
> hugetlb reservation cgroup accounting.
>
> It would be good if Mina (at least) would look these over. Would also
> be interesting to know if these fixes address the bug seen with the qemu
> use case.

I strongly suspect it will. Compiling, will reply in half an our or so
with the result.

>
> I'm still doing more testing and code inspection to look for other issues.

When sending, can you make sure to credit Michal P.? Thanks!

Reported-by: Michal Privoznik <mprivozn@xxxxxxxxxx>

>
> From 861bcd7d0443f18a5fed3c3ddc5f1c71e78c4ef4 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Date: Tue, 20 Oct 2020 20:21:42 -0700
> Subject: [PATCH] hugetlb_cgroup: fix reservation accounting
>
> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> ---
> mm/hugetlb.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 67fc6383995b..c92366313780 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -685,17 +685,17 @@ static long region_del(struct resv_map *resv, long f, long t)
> }
>
> if (f <= rg->from) { /* Trim beginning of region */
> - del += t - rg->from;
> - rg->from = t;
> -
> hugetlb_cgroup_uncharge_file_region(resv, rg,
> t - rg->from);
> - } else { /* Trim end of region */
> - del += rg->to - f;
> - rg->to = f;
>
> + del += t - rg->from;
> + rg->from = t;
> + } else { /* Trim end of region */
> hugetlb_cgroup_uncharge_file_region(resv, rg,
> rg->to - f);
> +
> + del += rg->to - f;
> + rg->to = f;

Those two look very correct to me.

You could keep computing "del" before the uncharge, similar to the /*
Remove entire region */ case.

> }
> }
>
> @@ -2454,6 +2454,9 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>
> rsv_adjust = hugepage_subpool_put_pages(spool, 1);
> hugetlb_acct_memory(h, -rsv_adjust);
> + if (deferred_reserve)
> + hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
> +

That looks correct to me as well.

> }
> return page;
>
>

Thanks for debugging!

--
Thanks,

David / dhildenb