Re: [linus:master] will-it-scale.per_thread_ops -40.2% regression in mmap1 benchmark

From: Liam Howlett
Date: Mon Dec 19 2022 - 13:10:36 EST


* kernel test robot <yujie.liu@xxxxxxxxx> [221219 05:01]:
> Greetings,
>
> FYI, we noticed a -40.2% regression of will-it-scale.per_thread_ops
> between commit 524e00b36e8c and e15e06a83923 of mainline

Thank you for running this test.

We are aware of this regression. The regression was taken as an
acceptable trade off for the gain on the read speed. Applications
perform more reads than writes to the VMA tree. The overfall
performance on real applications is either even or faster with the maple
tree. This can be seen in the kernel build times where forked processes
are short lived and would be close to the worst case scenario.

This isn't to say we can't do better, and we are constantly working
towards faster performance. Please continue to report on the
performance.

Looking specifically at mmap1, it is mapping then unmapping in a tight
loop. The regression would be expected, considering the internals of
what is going on, but I don't believe this would ever happen in an
application that is doing what it is supposed to be doing.

If you find a real application that shows a performance regression,
please let us know.

>
> 524e00b36e8c5 mm: remove rb tree.
> 0c563f1480435 proc: remove VMA rbtree use from nommu
> d0cf3dd47f0d5 damon: convert __damon_va_three_regions to use the VMA iterator
> c9dbe82cb99db kernel/fork: use maple tree for dup_mmap() during forking
> 3499a13168da6 mm/mmap: use maple tree for unmapped_area{_topdown}
> 7fdbd37da5c6f mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
> be8432e7166ef mm/mmap: use the maple tree in find_vma() instead of the rbtree.
> 2e3af1db17442 mmap: use the VMA iterator in count_vma_pages_range()
> f39af05949a42 mm: add VMA iterator
> d4af56c5c7c67 mm: start tracking VMAs with maple tree
> e15e06a839232 lib/test_maple_tree: add testing for maple tree
>
> in testcase: will-it-scale
> on test machine: 104 threads 2 sockets (Skylake) with 192G memory
> with following parameters:
>
> nr_task: 50%
> mode: thread
> test: mmap1
> cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
> We couldn't find out the commit that introduced this regression because
> some of above commits failed to boot during bisection, but looks it is
> related with maple tree code. Please check following details:

It is interesting that these issues were not detected by myself or other
build bots. Perhaps there is a configuration option that wasn't tested.
In any rate, all of the listed commits were in preparation for the last
commit to remove the rb tree. Regardless of which commit introduced the
regression, it is the fact that that the maple tree is slower on writes
that is being detected.

Thanks,
Liam