Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events

From: Dave Hansen
Date: Fri May 21 2021 - 18:16:05 EST


On 5/21/21 3:28 AM, Mel Gorman wrote:
> The PCP high watermark is based on the number of online CPUs so the
> watermarks must be adjusted during CPU hotplug. At the time of
> hot-remove, the number of online CPUs is already adjusted but during
> hot-add, a delta needs to be applied to update PCP to the correct
> value. After this patch is applied, the high watermarks are adjusted
> correctly.
>
> # grep high: /proc/zoneinfo | tail -1
> high: 649
> # echo 0 > /sys/devices/system/cpu/cpu4/online
> # grep high: /proc/zoneinfo | tail -1
> high: 664
> # echo 1 > /sys/devices/system/cpu/cpu4/online
> # grep high: /proc/zoneinfo | tail -1
> high: 649

This is actually a comment more about the previous patch, but it doesn't
really become apparent until the example above.

In your example, you mentioned increased exit() performance by using
"vm.percpu_pagelist_fraction to increase the pcp->high value". That's
presumably because of the increased batching effects and fewer lock
acquisitions.

But, logically, doesn't that mean that, the more CPUs you have in a
node, the *higher* you want pcp->high to be? If we took this to the
extreme and had an absurd number of CPUs in a node, we could end up with
a too-small pcp->high value.

Also, do you worry at all about a zone with a low min_free_kbytes seeing
increased zone lock contention?

...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf5cdc466e6c..2761b03b3a44 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6628,7 +6628,7 @@ static int zone_batchsize(struct zone *zone)
> #endif
> }
>
> -static int zone_highsize(struct zone *zone)
> +static int zone_highsize(struct zone *zone, int cpu_online)
> {
> #ifdef CONFIG_MMU
> int high;
> @@ -6640,7 +6640,7 @@ static int zone_highsize(struct zone *zone)
> * CPUs local to a zone. Note that early in boot that CPUs may
> * not be online yet.
> */
> - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone))));
> + nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online;
> high = low_wmark_pages(zone) / nr_local_cpus;

Is this "+ cpu_online" bias because the CPU isn't in cpumask_of_node()
when the CPU hotplug callback occurs? If so, it might be nice to mention.