[PATCH v2 00/70] RFC mm: Introducing the Maple Tree

From: Liam R. Howlett
Date: Tue Jan 12 2021 - 11:14:34 EST


***NOTE: the majority of this patch is the tests (35855 lines).***

Thanks to Vlastimil Babka for helping with perf on the full kernel
build and looking at the cache numbers.

Changes since last RFC:
- rebased against 5.10 and updated benchmarks
- Transitioned more internal mm operations to using maple states
- Removed the mm_struct linked list as well as the maximum address
- Replaced external linked list operations by maple tree iterators or
searches

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel that a
non-overlapping range-based tree would be beneficial, especially one with a
simple interface. The first user that is covered in this patch set is the
vm_area_struct rbtree in the mm_struct with the long term goal of reducing the
contention of the mmap_sem.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes.
With the increased branching factor, it is significantly short than the rbtree
so it has fewer cache misses. As you can see below, the performance is getting
very close even without the vmacache. I am looking for ways to optimize the
vma code to make this even faster and would like input. As this tree requires
allocations to provide RCU functionality, it is worth avoiding expanding the
tree only to contract it in the next step.

This patch set is based on 5.10.

It is important to note that this is an RFC and there are limitations around
what is currently supported.

The current status of this release is:
- **No support for 32 bit or nommu builds** There will be test robot emails
- Removal of the vmacache
- Removal of the mm_struct linked list and highest VMA end.

- Increase in performance in the following micro-benchmarks in Hmean:
- will-it-scale signal1-processes -5 to +10%
- will-it-scale page_fault1-threads increase upper limit by 20%
- will-it-scale page_fault2-threads -5 to +45% (disregarding dip in rbtree at
30-48 threads)
- will-it-scale malloc1-threads 24.5 to 1168%, tests with 30, 48, 79, 110,
and 141 are all over 100% better.
- will-it-scale brk1-threads **This test doesn't make sense, disregard**
- will-it-scale pthread_mutex1-threads -4.6 to +17%
- will-it-scale signal1-processes -6 to +10.7%
- will-it-scale signal1-threads -0.7 to +11%

- Decrease in performance in the following micro-benchmarks:
- will-it-scale page_fault3-processes -3.5 to -7%
- will-it-scale brk1 which tests insert speed of a vma as it is written of
-40 to -48%.
- will-it-scale malloc1-processes -0.7 to -13%

- page_fault3-threads is worth noting as a bumpy graph with a higher base
value but a lower upper value.

- kernbench elapse time is relatively flat. Please see performance
details for more information.

- Converted the following to use the advanced maple tree interface:
mm/mmap.c
- brk()
- mmap_region()
- do_munmap()
- dup_mmap()
- do_munmap()
mm/memory.c
- free_pgtables()
- unmap_vmas()

- Currently operating in non-RCU mode. Once turned on, RCU mode will
automatically enable when more than one thread is active on a task.

The long term goal of the maple tree is to reduce mmap_sem contention by
removing users that cause contention one by one. This may lead to the lock
being completely removed at some point.

The secondary goals is to add a fast RCU tree with a simple interface to
the kernel and to clean up the mm code by removing the integration of
the tree within the code and structures themselves.


Link: https://github.com/oracle/linux-uek/releases/tag/howlett%2Fmaple%2F20210105


============================================================================
Implementation details:

Each node is 256 Bytes and has multiple node types. The current implementation
uses two node types and has a branching factor of 16 for leaf/non-alloc nodes
and 10 for internal alloc nodes. The advantage of the internal alloc nodes is
the ability to track the largest sub-tree gap for each entry.

With the removal of the vmacache and the linked list, most benchmarks are close
to achieving parity and some are now above what the rbtree achieves. I'd still
like some help with figuring out how to make building kernels faster.

The tree is at a disadvantage of needing to allocate storage space for internal
use where as the rbtree stored the data within the vm_area_struct. This is a
necessary trade off to move to RCU mode in the future.


performance details and raw data:

kernbench system time is up by ~4%, but elapsed time is pretty much the
same.

Perf analysis of a 140 thread kernel build cache shows the follow:

With v2 RFC of the maple tree:
14,692,670.92 msec task-clock # 68.720 CPUs utilized
43,983,754,174,774 cycles # 2.994 GHz
35,221,280,320,157 instructions # 0.80 insn per cycle
548,066,089,628 cache-references # 37.302 M/sec
39,632,749,650 cache-misses # 7.231 % of all cache refs

213.804939614 seconds time elapsed

12822.727778000 seconds user
1768.516799000 seconds sys

Vanilla 5.10:
14,535,940.58 msec task-clock # 67.025 CPUs utilized
43,541,026,636,749 cycles # 2.995 GHz
35,025,757,470,518 instructions # 0.80 insn per cycle
535,675,044,717 cache-references # 36.852 M/sec
38,837,910,214 cache-misses # 7.250 % of all cache refs

216.872446418 seconds time elapsed

12785.056085000 seconds user
1653.652012000 seconds sys


0.555% more instructions were executed (195522849639 more)
0.019% more cache hits occurred

I *think* the system is able to keep the CPUs busy more often due to the
increased cache hit rate. However, the added instructions reduce or remove the
advantage at this point.

I had thought that there is more CPU used in-kernel but the bottleneck
of the mmap_sem lock is still the runtime speed limiting factor. This
doesn't make sense as pointed out by Vlastimil 'because kernel build
should be lots of separate single-threaded processes (gcc) so where
would that contention come from'.

I would appreciate any insightful comments on the results.


Benchmark numbers from mmtests on a 144 core system:

wis-signal 5.10 maple tree
Hmean signal1-processes-2 1786040.86 ( 0.00%) 1810613.96 * 1.38%*
Hmean signal1-processes-5 4495852.24 ( 0.00%) 4456978.23 * -0.86%*
Hmean signal1-processes-8 6962785.32 ( 0.00%) 6928809.21 * -0.49%*
Hmean signal1-processes-12 9996679.72 ( 0.00%) 9960859.26 * -0.36%*
Hmean signal1-processes-21 5800266.56 ( 0.00%) 6424291.55 * 10.76%*
Hmean signal1-processes-30 5461129.81 ( 0.00%) 5965843.00 * 9.24%*
Hmean signal1-processes-48 5402662.10 ( 0.00%) 5373518.13 * -0.54%*
Hmean signal1-processes-79 6324458.64 ( 0.00%) 6373855.01 * 0.78%*
Hmean signal1-processes-110 7888582.74 ( 0.00%) 7715943.65 * -2.19%*
Hmean signal1-processes-141 9708237.79 ( 0.00%) 9144247.40 * -5.81%*


wis-pf 5.10 maple tree
Hmean page_fault3-processes-2 1528339.53 ( 0.00%) 1418181.17 * -7.21%*
Hmean page_fault3-processes-5 3844682.44 ( 0.00%) 3568964.13 * -7.17%*
Hmean page_fault3-processes-8 5962218.35 ( 0.00%) 5582064.77 * -6.38%*
Hmean page_fault3-processes-12 8627107.79 ( 0.00%) 8161496.35 * -5.40%*
Hmean page_fault3-processes-21 13714558.07 ( 0.00%) 13137400.10 * -4.21%*
Hmean page_fault3-processes-30 19982269.52 ( 0.00%) 18935995.56 * -5.24%*
Hmean page_fault3-processes-48 31128230.35 ( 0.00%) 29117987.37 * -6.46%*
Hmean page_fault3-processes-79 46534013.37 ( 0.00%) 43858063.38 * -5.75%*
Hmean page_fault3-processes-110 51560408.08 ( 0.00%) 49113317.22 * -4.75%*
Hmean page_fault3-processes-141 56715596.66 ( 0.00%) 54704417.46 * -3.55%*

Hmean page_fault1-threads-2 1845141.09 ( 0.00%) 1845041.57 * -0.01%*
Hmean page_fault1-threads-5 4170927.21 ( 0.00%) 4174427.49 * 0.08%*
Hmean page_fault1-threads-8 5808980.21 ( 0.00%) 5887126.65 * 1.35%*
Hmean page_fault1-threads-12 5925792.38 ( 0.00%) 5905272.94 * -0.35%*
Hmean page_fault1-threads-21 5871532.58 ( 0.00%) 6440880.05 * 9.70%*
Hmean page_fault1-threads-30 6832271.78 ( 0.00%) 6861522.91 * 0.43%*
Hmean page_fault1-threads-48 8614245.56 ( 0.00%) 8922534.13 * 3.58%*
Hmean page_fault1-threads-79 11609583.54 ( 0.00%) 11257758.66 * -3.03%*
Hmean page_fault1-threads-110 11594428.36 ( 0.00%) 12005585.11 * 3.55%*
Hmean page_fault1-threads-141 11314690.18 ( 0.00%) 13623989.67 * 20.41%*

Hmean page_fault2-threads-2 806638.35 ( 0.00%) 790103.29 * -2.05%*
Hmean page_fault2-threads-5 1689522.43 ( 0.00%) 1705998.81 * 0.98%*
Hmean page_fault2-threads-8 2117462.47 ( 0.00%) 2002244.16 * -5.44%*
Hmean page_fault2-threads-12 2130151.22 ( 0.00%) 2181799.70 * 2.42%*
Hmean page_fault2-threads-21 1428004.62 ( 0.00%) 2076765.86 * 45.43%*
Hmean page_fault2-threads-30 1163606.91 ( 0.00%) 2090400.49 * 79.65%*
Hmean page_fault2-threads-48 605893.29 ( 0.00%) 1712174.51 * 182.59%*
Hmean page_fault2-threads-79 1401105.11 ( 0.00%) 1948016.56 * 39.03%*
Hmean page_fault2-threads-110 2107269.70 ( 0.00%) 2063247.07 * -2.09%*
Hmean page_fault2-threads-141 1148808.86 ( 0.00%) 1643422.67 * 43.05%*

Hmean page_fault3-threads-2 1370710.48 ( 0.00%) 1359083.92 * -0.85%*
Hmean page_fault3-threads-5 3010390.55 ( 0.00%) 2488127.21 * -17.35%*
Hmean page_fault3-threads-8 3214674.74 ( 0.00%) 2502612.56 * -22.15%*
Hmean page_fault3-threads-12 3443935.69 ( 0.00%) 2788050.49 * -19.04%*
Hmean page_fault3-threads-21 1500648.03 ( 0.00%) 1748934.69 * 16.55%*
Hmean page_fault3-threads-30 1633922.78 ( 0.00%) 1644905.78 * 0.67%*
Hmean page_fault3-threads-48 1357568.84 ( 0.00%) 1501556.63 * 10.61%*
Hmean page_fault3-threads-79 1731774.85 ( 0.00%) 1527178.31 * -11.81%*
Hmean page_fault3-threads-110 1867416.16 ( 0.00%) 1918662.69 * 2.74%*
Hmean page_fault3-threads-141 1901439.58 ( 0.00%) 1960142.22 * 3.09%*


wis-malloc 5.10 maple tree
Hmean brk1-processes-2 4547264.11 ( 0.00%) 2711590.85 * -40.37%*
Hmean brk1-processes-5 11396047.63 ( 0.00%) 6723760.18 * -41.00%*
Hmean brk1-processes-8 17684347.79 ( 0.00%) 9850073.01 * -44.30%*
Hmean brk1-processes-12 25803477.74 ( 0.00%) 13551301.45 * -47.48%*
Hmean brk1-processes-21 42083608.45 ( 0.00%) 22484839.92 * -46.57%*
Hmean brk1-processes-30 61095769.10 ( 0.00%) 31853867.49 * -47.86%*
Hmean brk1-processes-48 95924485.69 ( 0.00%) 50239694.48 * -47.63%*
Hmean brk1-processes-79 144415437.49 ( 0.00%) 75287655.57 * -47.87%*
Hmean brk1-processes-110 161729976.95 ( 0.00%) 83624072.84 * -48.29%*
Hmean brk1-processes-141 178985545.93 ( 0.00%) 92036828.36 * -48.58%*
*Note: This isn't testing brk, it's testing insert of a VMA

Hmean malloc1-processes-2 513481.82 ( 0.00%) 447458.20 * -12.86%*
Hmean malloc1-processes-5 1192865.83 ( 0.00%) 1052137.79 * -11.80%*
Hmean malloc1-processes-8 1290698.38 ( 0.00%) 1202802.10 * -6.81%*
Hmean malloc1-processes-12 1357323.31 ( 0.00%) 1300602.17 * -4.18%*
Hmean malloc1-processes-21 2052572.96 ( 0.00%) 1975328.24 * -3.76%*
Hmean malloc1-processes-30 2793186.04 ( 0.00%) 2700345.64 * -3.32%*
Hmean malloc1-processes-48 4087424.14 ( 0.00%) 4059035.58 * -0.69%*
Hmean malloc1-processes-79 5185320.24 ( 0.00%) 5120912.56 * -1.24%*
Hmean malloc1-processes-110 5231498.33 ( 0.00%) 4991315.64 * -4.59%*
Hmean malloc1-processes-141 5096423.47 ( 0.00%) 4847749.27 * -4.88%*

Hmean malloc1-threads-2 14119.77 ( 0.00%) 179149.22 *1168.78%*
Hmean malloc1-threads-5 120466.20 ( 0.00%) 158994.30 * 31.98%*
Hmean malloc1-threads-8 103323.01 ( 0.00%) 128684.83 * 24.55%*
Hmean malloc1-threads-12 102365.88 ( 0.00%) 127458.10 * 24.51%*
Hmean malloc1-threads-21 55230.52 ( 0.00%) 97276.98 * 76.13%*
Hmean malloc1-threads-30 40144.81 ( 0.00%) 87899.80 * 118.96%*
Hmean malloc1-threads-48 18651.63 ( 0.00%) 57501.70 * 208.29%*
Hmean malloc1-threads-79 5987.82 ( 0.00%) 36411.66 * 508.10%*
Hmean malloc1-threads-110 9425.14 ( 0.00%) 19021.17 * 101.81%*
Hmean malloc1-threads-141 6176.71 ( 0.00%) 22104.21 * 257.86%*

wis-pthreadmutex 5.10 maple tree
Hmean pthread_mutex1-threads-2 18757795.65 ( 0.00%) 18234843.22 * -2.79%*
Hmean pthread_mutex1-threads-5 10875114.75 ( 0.00%) 12748240.14 * 17.22%*
Hmean pthread_mutex1-threads-8 13474657.37 ( 0.00%) 14906619.63 * 10.63%*
Hmean pthread_mutex1-threads-12 14686773.11 ( 0.00%) 14096530.35 * -4.02%*
Hmean pthread_mutex1-threads-21 14253992.75 ( 0.00%) 13893250.09 * -2.53%*
Hmean pthread_mutex1-threads-30 15327949.47 ( 0.00%) 14613516.32 * -4.66%*
Hmean pthread_mutex1-threads-48 14892473.83 ( 0.00%) 15177204.93 * 1.91%*
Hmean pthread_mutex1-threads-79 14333125.23 ( 0.00%) 14254569.45 * -0.55%*
Hmean pthread_mutex1-threads-110 13623757.15 ( 0.00%) 14067267.02 * 3.26%*
Hmean pthread_mutex1-threads-141 13097069.36 ( 0.00%) 13320433.23 * 1.71%*

wis-signal 5.10 maple tree
Hmean signal1-processes-2 1786040.86 ( 0.00%) 1810613.96 * 1.38%*
Hmean signal1-processes-5 4495852.24 ( 0.00%) 4456978.23 * -0.86%*
Hmean signal1-processes-8 6962785.32 ( 0.00%) 6928809.21 * -0.49%*
Hmean signal1-processes-12 9996679.72 ( 0.00%) 9960859.26 * -0.36%*
Hmean signal1-processes-21 5800266.56 ( 0.00%) 6424291.55 * 10.76%*
Hmean signal1-processes-30 5461129.81 ( 0.00%) 5965843.00 * 9.24%*
Hmean signal1-processes-48 5402662.10 ( 0.00%) 5373518.13 * -0.54%*
Hmean signal1-processes-79 6324458.64 ( 0.00%) 6373855.01 * 0.78%*
Hmean signal1-processes-110 7888582.74 ( 0.00%) 7715943.65 * -2.19%*
Hmean signal1-processes-141 9708237.79 ( 0.00%) 9144247.40 * -5.81%*

Hmean signal1-threads-2 1253290.69 ( 0.00%) 1304356.96 * 4.07%*
Hmean signal1-threads-5 1105142.15 ( 0.00%) 1219370.11 * 10.34%*
Hmean signal1-threads-8 1085584.56 ( 0.00%) 1206909.95 * 11.18%*
Hmean signal1-threads-12 1113346.88 ( 0.00%) 1218394.23 * 9.44%*
Hmean signal1-threads-21 817087.31 ( 0.00%) 852237.46 * 4.30%*
Hmean signal1-threads-30 657827.07 ( 0.00%) 695077.88 * 5.66%*
Hmean signal1-threads-48 523114.49 ( 0.00%) 562605.91 * 7.55%*
Hmean signal1-threads-79 500027.91 ( 0.00%) 527026.99 * 5.40%*
Hmean signal1-threads-110 480231.16 ( 0.00%) 498458.54 * 3.80%*
Hmean signal1-threads-141 468149.75 ( 0.00%) 464573.98 * -0.76%*


kernbench 5.10-rc1 maple tree
Amean user-2 887.32 ( 0.00%) 882.62 * 0.53%*
Amean syst-2 145.77 ( 0.00%) 152.04 * -4.30%*
Amean elsp-2 522.10 ( 0.00%) 522.97 * -0.17%*
Amean user-4 902.24 ( 0.00%) 898.12 * 0.46%*
Amean syst-4 149.70 ( 0.00%) 156.35 * -4.44%*
Amean elsp-4 269.58 ( 0.00%) 271.99 * -0.90%*
Amean user-8 920.12 ( 0.00%) 913.07 * 0.77%*
Amean syst-8 149.97 ( 0.00%) 155.61 * -3.76%*
Amean elsp-8 141.55 ( 0.00%) 141.58 * -0.03%*
Amean user-16 935.31 ( 0.00%) 926.23 * 0.97%*
Amean syst-16 151.97 ( 0.00%) 157.41 * -3.58%*
Amean elsp-16 77.37 ( 0.00%) 77.35 * 0.04%*
Amean user-32 982.76 ( 0.00%) 977.36 * 0.55%*
Amean syst-32 162.56 ( 0.00%) 167.90 * -3.29%*
Amean elsp-32 46.17 ( 0.00%) 46.08 * 0.19%*
Amean user-64 1093.08 ( 0.00%) 1081.16 * 1.09%*
Amean syst-64 177.43 ( 0.00%) 183.63 * -3.49%*
Amean elsp-64 30.80 ( 0.00%) 30.28 * 1.68%*
Amean user-128 1611.20 ( 0.00%) 1610.98 * 0.01%*
Amean syst-128 234.32 ( 0.00%) 244.25 * -4.24%*
Amean elsp-128 25.76 ( 0.00%) 25.73 * 0.12%*
Amean user-256 1733.42 ( 0.00%) 1739.23 * -0.34%*
Amean syst-256 248.58 ( 0.00%) 259.16 * -4.26%*
Amean elsp-256 25.17 ( 0.00%) 25.44 * -1.07%*
Amean user-288 1734.83 ( 0.00%) 1737.28 * -0.14%*
Amean syst-288 249.63 ( 0.00%) 259.55 * -3.97%*
Amean elsp-288 25.43 ( 0.00%) 25.45 * -0.09%*


gitcheckout

Amean User 0.00 ( 0.00%) 0.00 * 0.00%*
Amean System 8.07 ( 0.00%) 7.92 * 1.83%*
Amean Elapsed 23.46 ( 0.00%) 22.91 * 2.35%*
Amean CPU 91.87 ( 0.00%) 92.33 * -0.51%*

Liam R. Howlett (70):
radix tree test suite: Enhancements for Maple Tree
radix tree test suite: Add support for fallthrough attribute
radix tree test suite: Add support for kmem_cache_free_bulk
radix tree test suite: Add keme_cache_alloc_bulk() support
Maple Tree: Add new data structure
mm: Start tracking VMAs with maple tree
mm/mmap: Introduce unlock_range() for code cleanup
mm/mmap: Change find_vma() to use the maple tree
mm/mmap: Change find_vma_prev() to use maple tree
mm/mmap: Change unmapped_area and unmapped_area_topdown to use maple
tree
kernel/fork: Convert dup_mmap to use maple tree
mm: Remove rb tree.
mm/gup: Add mm_populate_vma() for use when the vma is known
mm/mmap: Change do_brk_flags() to expand existing VMA and add
do_brk_munmap()
mm/mmap: Change vm_brk_flags() to use mm_populate_vma()
mm: Move find_vma_intersection to mmap.c and change implementation to
maple tree.
mm/mmap: Change mmap_region to use maple tree state
mm/mmap: Drop munmap_vma_range()
mm: Remove vmacache
mm/mmap: Change __do_munmap() to avoid unnecessary lookups.
mm/mmap: Move mmap_region() below do_munmap()
mm/mmap: Add do_mas_munmap() and wraper for __do_munmap()
mmap: Use find_vma_intersection in do_mmap() for overlap
mmap: Remove __do_munmap() in favour of do_mas_munmap()
mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()
mmap: make remove_vma_list() inline
mm: Introduce vma_next() and vma_prev()
arch/arm64: Remove mmap linked list from vdso.
arch/parsic: Remove mmap linked list from kernel/cache
arch/powerpc: Remove mmap linked list from mm/book2s32/tlb
arch/powerpc: Remove mmap linked list from mm/book2s32/subpage_prot
arch/powerpc: Optimize cell spu task sync.
arch/s390: Use maple tree iterators instead of linked list.
arch/um: Use maple tree iterators instead of linked list
arch/x86: Use maple tree iterators for vdso/vma
arch/xtensa: Use maple tree iterators for unmapped area
drivers/misc/cxl: Use maple tree iterators for cxl_prefault_vma()
drivers/oprofile: Lookup address in tree instead of linked list.
drivers/tee/optee: Use maple tree iterators for __check_mem_type()
fs/binfmt_elf: Use maple tree iterators for fill_files_note()
fs/coredump: Use maple tree iterators in place of linked list
fs/exec: Use vma_next() instead of linked list
fs/proc/base: Use maple tree iterators in place of linked list
fs/proc/task_mmu: Stop using linked list and highest_vm_end
fs/userfaultfd: Stop using vma linked list.
ipc/shm: Stop using the vma linked list
kernel/acct: Use maple tree iterators instead of linked list
kernel/events/core: Use maple tree iterators instead of linked list
kernel/events/uprobes: Use maple tree iterators instead of linked list
kernel/sched/fair: Use maple tree iterators instead of linked list
kernel/sys: Use maple tree iterators instead of linked list
mm/gup: Use maple tree navigation instead of linked list
mm/huge_memory: Use vma_next() instead of vma linked list
mm/khugepaged: Use maple tree iterators instead of vma linked list
mm/ksm: Use maple tree iterators instead of vma linked list
mm/madvise: Use vma_next instead of vma linked list
mm/memcontrol: Stop using mm->highest_vm_end
mm/mempolicy: Use maple tree iterators instead of vma linked list
mm/mlock: Use maple tree iterators instead of vma linked list
mm/mprotect: Use maple tree navigation instead of vma linked list
mm/mremap: Use vma_next() instead of vma linked list
mm/msync: Use vma_next() instead of vma linked list
mm/nommu: Use maple tree iterators instead of vma linked list
mm/oom_kill: Use maple tree iterators instead of vma linked list
mm/pagewalk: Use vma_next() instead of vma linked list
mm/swapfile: Use maple tree iterator instead of vma linked list
mm/nommu: Stop inserting into the vma linked list
mm/util: Remove __vma_link_list() and __vma_unlink_list()
mm: Remove vma linked list.
mm/mmap: Convert __insert_vm_struct to use mas, convert vma_link to
use vma_mas_link()

Documentation/core-api/index.rst | 1 +
Documentation/core-api/maple-tree.rst | 36 +
MAINTAINERS | 12 +
arch/arm64/kernel/vdso.c | 5 +-
arch/parisc/kernel/cache.c | 8 +-
arch/powerpc/mm/book3s32/tlb.c | 3 +-
arch/powerpc/mm/book3s64/subpage_prot.c | 13 +-
arch/powerpc/oprofile/cell/spu_task_sync.c | 22 +-
arch/s390/mm/gmap.c | 6 +-
arch/um/kernel/tlb.c | 14 +-
arch/x86/entry/vdso/vma.c | 7 +-
arch/x86/kernel/tboot.c | 2 +-
arch/xtensa/kernel/syscall.c | 3 +-
drivers/firmware/efi/efi.c | 2 +-
drivers/misc/cxl/fault.c | 3 +-
drivers/oprofile/buffer_sync.c | 14 +-
drivers/tee/optee/call.c | 13 +-
fs/binfmt_elf.c | 3 +-
fs/coredump.c | 13 +-
fs/exec.c | 7 +-
fs/proc/base.c | 5 +-
fs/proc/task_mmu.c | 43 +-
fs/userfaultfd.c | 24 +-
include/linux/maple_tree.h | 439 +
include/linux/mm.h | 49 +-
include/linux/mm_types.h | 34 +-
include/linux/mm_types_task.h | 5 -
include/linux/sched.h | 1 -
include/linux/sched/mm.h | 3 +
include/linux/vmacache.h | 28 -
include/trace/events/maple_tree.h | 227 +
include/trace/events/mmap.h | 71 +
init/main.c | 2 +
ipc/shm.c | 13 +-
kernel/acct.c | 6 +-
kernel/debug/debug_core.c | 12 -
kernel/events/core.c | 3 +-
kernel/events/uprobes.c | 9 +-
kernel/fork.c | 48 +-
kernel/sched/fair.c | 10 +-
kernel/sys.c | 3 +-
lib/Makefile | 3 +-
lib/maple_tree.c | 5700 +++
lib/test_maple_tree.c | 35855 ++++++++++++++++
mm/Makefile | 2 +-
mm/debug.c | 12 +-
mm/gup.c | 27 +-
mm/huge_memory.c | 6 +-
mm/init-mm.c | 4 +-
mm/internal.h | 4 +-
mm/khugepaged.c | 7 +-
mm/ksm.c | 18 +-
mm/madvise.c | 2 +-
mm/memcontrol.c | 6 +-
mm/memory.c | 39 +-
mm/mempolicy.c | 33 +-
mm/mlock.c | 20 +-
mm/mmap.c | 2022 +-
mm/mprotect.c | 8 +-
mm/mremap.c | 13 +-
mm/msync.c | 2 +-
mm/nommu.c | 14 +-
mm/oom_kill.c | 3 +-
mm/pagewalk.c | 2 +-
mm/swapfile.c | 3 +-
mm/util.c | 32 -
mm/vmacache.c | 117 -
tools/testing/radix-tree/.gitignore | 2 +
tools/testing/radix-tree/Makefile | 13 +-
tools/testing/radix-tree/generated/autoconf.h | 1 +
tools/testing/radix-tree/linux.c | 78 +-
tools/testing/radix-tree/linux/kernel.h | 8 +
tools/testing/radix-tree/linux/maple_tree.h | 3 +
tools/testing/radix-tree/linux/slab.h | 2 +
tools/testing/radix-tree/maple.c | 59 +
tools/testing/radix-tree/test.h | 1 +
.../radix-tree/trace/events/maple_tree.h | 8 +
77 files changed, 43844 insertions(+), 1507 deletions(-)
create mode 100644 Documentation/core-api/maple-tree.rst
create mode 100644 include/linux/maple_tree.h
delete mode 100644 include/linux/vmacache.h
create mode 100644 include/trace/events/maple_tree.h
create mode 100644 lib/maple_tree.c
create mode 100644 lib/test_maple_tree.c
delete mode 100644 mm/vmacache.c
create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
create mode 100644 tools/testing/radix-tree/maple.c
create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h

--
2.28.0