Re: [PATCH] mm: include CMA pages in lowmem_reserve at boot

From: Michal Hocko
Date: Thu Aug 13 2020 - 07:17:40 EST


On Wed 12-08-20 20:51:38, Doug Berger wrote:
> The lowmem_reserve arrays provide a means of applying pressure
> against allocations from lower zones that were targeted at
> higher zones. Its values are a function of the number of pages
> managed by higher zones and are assigned by a call to the
> setup_per_zone_lowmem_reserve() function.
>
> The function is initially called at boot time by the function
> init_per_zone_wmark_min() and may be called later by accesses
> of the /proc/sys/vm/lowmem_reserve_ratio sysctl file.
>
> The function init_per_zone_wmark_min() was moved up from a
> module_init to a core_initcall to resolve a sequencing issue
> with khugepaged. Unfortunately this created a sequencing issue
> with CMA page accounting.
>
> The CMA pages are added to the managed page count of a zone
> when cma_init_reserved_areas() is called at boot also as a
> core_initcall. This makes it uncertain whether the CMA pages
> will be added to the managed page counts of their zones before
> or after the call to init_per_zone_wmark_min() as it becomes
> dependent on link order. With the current link order the pages
> are added to the managed count after the lowmem_reserve arrays
> are initialized at boot.
>
> This means the lowmem_reserve values at boot may be lower than
> the values used later if /proc/sys/vm/lowmem_reserve_ratio is
> accessed even if the ratio values are unchanged.
>
> In many cases the difference is not significant, but in others
> it may have an affect.

Could you be more specific please?

> This commit breaks the link order dependency by invoking
> init_per_zone_wmark_min() as a postcore_initcall so that the
> CMA pages have the chance to be properly accounted in their
> zone(s) and allowing the lowmem_reserve arrays to receive
> consistent values.
>
> Fixes: bc22af74f271 ("mm: update min_free_kbytes from khugepaged after core initialization")
> Signed-off-by: Doug Berger <opendmb@xxxxxxxxx>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b7d0ecf30b1..f3e340ec2b6b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7887,7 +7887,7 @@ int __meminit init_per_zone_wmark_min(void)
>
> return 0;
> }
> -core_initcall(init_per_zone_wmark_min)
> +postcore_initcall(init_per_zone_wmark_min)
>
> /*
> * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so
> --
> 2.7.4
>

--
Michal Hocko
SUSE Labs