Maple Tree Work

From: Liam R. Howlett
Date: Fri Jul 07 2023 - 12:39:46 EST


Hello,

I have received enquiries about the status of features for the maple
tree as well as requests for work. I've decided to track this on the
maple tree mailing list [1] for better coordination and open discussion.

Please keep all discussions on the maple tree mailing list.

I'll send out an updated list monthly with names against people working
on them to avoid work duplication, so don't work on something with a
name next to it.

If you want to work on something then please respond to this email so
everyone knows and I can add your name to the item. Feel free to ask
questions to clarify the feature or discuss directions.

Likewise, if you cannot work on anything anymore then let me know so
others can.

If you have any ideas, then please email the list. We can discuss it
and I can add it to the list.

Please sign up to the maple tree mailing list [1] to get these emails.

Features:
- Better preallocation calculations - Liam R. Howlett 2023/07/07
Without tracking the tree status on the walk down, we can
partially walk down to figure out the type of write and which
'worst case' a write will cause. Using this "write type"
information, the preallocations can be a better estimate. v2 was
sent to the mailing list [2].

- mas_store_gfp() align with mas_prealloc()
Right now mas_store_gfp() is going to allocate once it figures
out what to do and it does the worst case allocations. This is
inefficient and can easily be done better by doing more of what
mas_prealloc()/mas_store_prealloc() does - or at least will be
doing once the 'Better preallocation calculations' lands.

- Tracking tree status on walks down
Track in the maple state when the last minimum nodes, space for
2, or space for 3 slots occurred (height in the tree). This
will allow for a better estimate on how many nodes are needed
for a write. This can be complicated on next/prev walking, etc.
It has to be done in read mode since some walk may have been
done before deciding to write - see mmap_region(), for example.

- Store type (enum for store type?)
Extending the "Tracking tree status on walks down", the
information can then be used to determine what operation is
needed during a mas_prealloct(), etc. The operation can then be
stored in the maple state to continue on a mas_store_prealloc().
Obviously, this would work out well if we have mas_store_gfp()
using the same functionality as mentioned above.

- Full count/flag & Dense nodes
There is a bit that exists in the pointer reserved to indicate
there is no NULLs in the leaves below. This feature is mainly
for trees that are non-alloc trees. We can then know there is
at least one singleton that can be stored below this entry.
This is coupled with restoring Dense nodes as a potential node
type. The tricky part is deciding on when to switch to dense
nodes/back from dense nodes (all entries to dense nodes must be
singletons!). See mte_set_full(), mte_clear_full(),
mte_has_null() which use MAPLE_ENODE_NULL.

- Fork & Dup tree + Delete DONT_COPY
This is to optimize dup_mmap() in kernel/fork.c, but other
users may want faster duplications of a tree.
This should be faster than building a tree in bulk mode. The
idea is to just copy each node and replace the pointers in,
probably, a BFS order. Once the leaves are reached, the VMAs
will be replaced by the copies made in fork, unless DONT_COPY is
set, in which case the VMA will be deleted from the copied tree.
DONT_COPY is not common and since the tree isn't visible, all
nodes should be available for reuse (no RCU worries).

- Push reuse parent
During an insert, new nodes can be "pushed" left or right -
see mas_push_data(). On a left push, it may be possible to
reuse the node by making the write like an 'append'. This may
also be possible to be done during a split operation when the
write extends the last slot. This needs examining to ensure RCU
safety.

Larger Features:
- 64 bit stores on 32 bit arch
A number of users want to use the maple tree, but cannot because
they need a storage medium that supports 64 bit stores on 32 bit
arch.

- wr_mas with alloc instead of mas
Internally, we have a write maple state, but the maple state
currently holds the allocations. I'm not sure if this is worth
it, and it will probably need "Tracking tree status on walks
down" so that preallocations/allocations can be accurate.

- Big Dense Node Type
There are various node types that could be added to the maple
tree. One of the most useful is probably a 'big dense' with the
node being a 4k block of singletons. This would have to come
after the dense node type as dense enables more users and helps
scope this work.

- Judy Array
The Judy Array is another fancy data structure. Mathieu
Desnoyers has enquired and spoken to us about incorporating judy
arrays as a node type or other features he has in the judy
array. This is in the early thought stages.

Please note that any new features will need test cases added to ensure
they function correctly.

[1] https://lists.infradead.org/mailman/listinfo/maple-tree
[2] http://lists.infradead.org/pipermail/maple-tree/2023-June/002599.html