Re: [linus:master] [mm/mmap] 28c5609fb2: aim9.page_test.ops_per_sec -10.8% regression

From: Liam R. Howlett
Date: Tue May 09 2023 - 18:35:26 EST


* Yin Fengwei <fengwei.yin@xxxxxxxxx> [230509 02:56]:
> Hi Liam,
>
> On 5/6/23 14:20, kernel test robot wrote:
> > Hello,
> >
> > kernel test robot noticed a -10.8% regression of aim9.page_test.ops_per_sec on:
> >
> > commit: 28c5609fb236807910ca347ad3e26c4567998526 ("mm/mmap: preallocate maple nodes for brk vma expansion")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: aim9
> > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
> > parameters:
> >
> > testtime: 5s
> > test: all
> > cpufreq_governor: performance
> >
> > If you fix the issue, kindly add following tag
> > | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > | Link: https://lore.kernel.org/oe-lkp/202305061457.ac15990c-yujie.liu@xxxxxxxxx
> >
>
> Some finding related:
> eBPF funclatency tool says the latency of function do_brk_flags() doubles
> with the patch 28c5609fb2.
>
> With the patch 28c5609fb2, the mas_alloc_nodes() is called much more than
> without the patch.

Thank you for the insight into this test.

Right, so this is patch adds the call to preallocate nodes for the worst
case possible. That certainly explains why you see so many more calls
to allocate nodes - it was meant to do just that.

>
> In my local debugging env, I can see around 17009999 times call to
> mas_alloc_nodes(). The number is zero without the patch 28c5609fb2.
> So we are kind of sure the regression is connected to the patch.
>
>
> The page_test of AIM9 is doing following work with single thread:
> newbrk = sbrk(1024 * 1024); /* move up 1 megabyte */
> while (true) { /* while not done */
> newbrk = sbrk(-4096 * 16); /* deallocate some space */
> for (i = 0; i < 16; i++) { /* now get it back in pieces */
> newbrk = sbrk(4096); /* Get pointer to new space */
> }
> }
>
> Is it possible that the sbrk pattern triggers the corner case? Thanks.

I appreciate the analysis and the pointer to the allocation code. This
has shown up somewhere else and I'm working on reducing the
preallocations. This regression seems to be hidden, sometimes at least,
by the kmem_cache.

Regards,
Liam