Re: [PATCH 00/14] Reduce preallocations for maple tree

From: Yin, Fengwei
Date: Sun Jun 04 2023 - 08:10:53 EST


Hi Liam,

On 6/3/2023 2:55 AM, Liam R. Howlett wrote:
> * Yin, Fengwei <fengwei.yin@xxxxxxxxx> [230602 04:11]:
>> Hi Liam,
>>
>> On 6/1/2023 10:15 AM, Liam R. Howlett wrote:
>>> Initial work on preallocations showed no regression in performance
>>> during testing, but recently some users (both on [1] and off [android]
>>> list) have reported that preallocating the worst-case number of nodes
>>> has caused some slow down. This patch set addresses the number of
>>> allocations in a few ways.
>>>
>>> During munmap() most munmap() operations will remove a single VMA, so
>>> leverage the fact that the maple tree can place a single pointer at
>>> range 0 - 0 without allocating. This is done by changing the index in
>>> the 'sidetree'.
>>>
>>> Re-introduce the entry argument to mas_preallocate() so that a more
>>> intelligent guess of the node count can be made.
>>>
>>> Patches are in the following order:
>>> 0001-0002: Testing framework for benchmarking some operations
>>> 0003-0004: Reduction of maple node allocation in sidetree
>>> 0005: Small cleanup of do_vmi_align_munmap()
>>> 0006-0013: mas_preallocate() calculation change
>>> 0014: Change the vma iterator order
>> I did run The AIM:page_test on an IceLake 48C/96T + 192G RAM platform with
>> this patchset.
>>
>> The result has a little bit improvement:
>> Base (next-20230602):
>> 503880
>> Base with this patchset:
>> 519501
>>
>> But they are far from the none-regression result (commit 7be1c1a3c7b1):
>> 718080
>>
>>
>> Some other information I collected:
>> With Base, the mas_alloc_nodes are always hit with request: 7.
>> With this patchset, the request are 1 or 5.
>>
>> I suppose this is the reason for improvement from 503880 to 519501.
>>
>> With commit 7be1c1a3c7b1, mas_store_gfp() in do_brk_flags never triggered
>> mas_alloc_nodes() call. Thanks.
>
> Thanks for retesting. I've not been able to see the regression myself.
> Are you running in a VM of sorts? Android and some cloud VMs seem to
I didn't run it in VM. I run it on a native env.

> see this, but I do not in kvm or the server I test on.
>
> I am still looking to reduce/reverse the regression and a reproducer on
> my end would help.

The test is page_test of AIM9. You could get AIM9 test suite from:
http://nchc.dl.sourceforge.net/project/aimbench/aim-suite9

After build it, we could see app singleuser.

It needs a txt file named s9workfile to define the test case. The s9workfile
I am using has following content:

# @(#) s9workfile:1.2 1/22/96 00:00:00
# AIM Independent Resource Benchmark - Suite IX Workfile
FILESIZE: 5M
page_test

Then you can run the testing by command:
./singleuser -nl

It will ask some configuration questions and then run the real test.

One thing need be taken care is that the create-clo.c has one line:
newbrk = sbrk(-4096 * 16);

It should be updated as:
intptr_t inc = -4096 * 16;
newbrk = sbrk(inc);

Otherwise, the -4096 * 16 will be treated as 32 bit and the line is
changed to extend brk to around 4G. If we don't have enough RAM, the
set_brk syscall will fail.

If you met any issue to run the test, just ping me. Thanks.


Regards
Yin, Fengwei

>
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>> [1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@xxxxxxxxx/
>>>
>>> Liam R. Howlett (14):
>>> maple_tree: Add benchmarking for mas_for_each
>>> maple_tree: Add benchmarking for mas_prev()
>>> mm: Move unmap_vmas() declaration to internal header
>>> mm: Change do_vmi_align_munmap() side tree index
>>> mm: Remove prev check from do_vmi_align_munmap()
>>> maple_tree: Introduce __mas_set_range()
>>> mm: Remove re-walk from mmap_region()
>>> maple_tree: Re-introduce entry to mas_preallocate() arguments
>>> mm: Use vma_iter_clear_gfp() in nommu
>>> mm: Set up vma iterator for vma_iter_prealloc() calls
>>> maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null()
>>> maple_tree: Update mas_preallocate() testing
>>> maple_tree: Refine mas_preallocate() node calculations
>>> mm/mmap: Change vma iteration order in do_vmi_align_munmap()
>>>
>>> fs/exec.c | 1 +
>>> include/linux/maple_tree.h | 23 ++++-
>>> include/linux/mm.h | 4 -
>>> lib/maple_tree.c | 78 ++++++++++----
>>> lib/test_maple_tree.c | 74 +++++++++++++
>>> mm/internal.h | 40 ++++++--
>>> mm/memory.c | 16 ++-
>>> mm/mmap.c | 171 ++++++++++++++++---------------
>>> mm/nommu.c | 45 ++++----
>>> tools/testing/radix-tree/maple.c | 59 ++++++-----
>>> 10 files changed, 331 insertions(+), 180 deletions(-)
>>>