Re: [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization

From: Daniel Jordan
Date: Fri Mar 08 2024 - 12:36:53 EST


On Thu, Feb 22, 2024 at 10:04:21PM +0800, Gang Li wrote:
> Optimizing the initialization speed of 1G huge pages through
> parallelization.
>
> 1G hugetlbs are allocated from bootmem, a process that is already
> very fast and does not currently require optimization. Therefore,
> we focus on parallelizing only the initialization phase in
> `gather_bootmem_prealloc`.
>
> Here are some test results:
> test case no patch(ms) patched(ms) saved
> ------------------- -------------- ------------- --------
> 256c2T(4 node) 1G 4745 2024 57.34%
> 128c1T(2 node) 1G 3358 1712 49.02%
> 12T 1G 77000 18300 76.23%

Another great improvement.

> +static void __init gather_bootmem_prealloc_parallel(unsigned long start,
> + unsigned long end, void *arg)
> +{
> + int nid;
> +
> + for (nid = start; nid < end; nid++)
> + gather_bootmem_prealloc_node(nid);
> +}
> +
> +static void __init gather_bootmem_prealloc(void)
> +{
> + struct padata_mt_job job = {
> + .thread_fn = gather_bootmem_prealloc_parallel,
> + .fn_arg = NULL,
> + .start = 0,
> + .size = num_node_state(N_MEMORY),
> + .align = 1,
> + .min_chunk = 1,
> + .max_threads = num_node_state(N_MEMORY),
> + .numa_aware = true,
> + };
> +
> + padata_do_multithreaded(&job);
> +}

Looks fine from the padata side.

Acked-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> # padata