Re: [PATCH v5 03/10] maple_tree: Introduce interfaces __mt_dup() and mtree_dup()

From: Liam R. Howlett
Date: Tue Oct 17 2023 - 09:58:31 EST


* Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [231015 23:23]:
> Introduce interfaces __mt_dup() and mtree_dup(), which are used to
> duplicate a maple tree. They duplicate a maple tree in Depth-First
> Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the
> source tree and allocate new child nodes in non-leaf nodes. The new node
> is exactly the same as the source node except for all the addresses
> stored in it. It will be faster than traversing all elements in the
> source tree and inserting them one by one into the new tree. The time
> complexity of these two functions is O(n).
>
> The difference between __mt_dup() and mtree_dup() is that mtree_dup()
> handles locks internally.
>
> Analysis of the average time complexity of this algorithm:
>
> For simplicity, let's assume that the maximum branching factor of all
> non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a
> full tree.
>
> Under the given conditions, if there is a maple tree with n elements,
> the number of its leaves is n/16. From bottom to top, the number of
> nodes in each level is 1/16 of the number of nodes in the level below.
> So the total number of nodes in the entire tree is given by the sum of
> n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has
> log(n) terms with base 16. According to the formula for the sum of a
> geometric series, the sum of this series can be calculated as (n-1)/15.
> Each node has only one parent node pointer, which can be considered as
> an edge. In total, there are (n-1)/15-1 edges.
>
> This algorithm consists of two operations:
>
> 1. Traversing all nodes in DFS order.
> 2. For each node, making a copy and performing necessary modifications
> to create a new node.
>
> For the first part, DFS traversal will visit each edge twice. Let
> T(ascend) represent the cost of taking one step downwards, and
> T(descend) represent the cost of taking one step upwards. And both of
> them are constants (although mas_ascend() may not be, as it contains a
> loop, but here we ignore it and treat it as a constant). So the time
> spent on the first part can be represented as
> ((n-1)/15-1) * (T(ascend) + T(descend)).
>
> For the second part, each node will be copied, and the cost of copying a
> node is denoted as T(copy_node). For each non-leaf node, it is necessary
> to reallocate all child nodes, and the cost of this operation is denoted
> as T(dup_alloc). The behavior behind memory allocation is complex and
> not specific to the maple tree operation. Here, we assume that the time
> required for a single allocation is constant. Since the size of a node
> is fixed, both of these symbols are also constants. We can calculate
> that the time spent on the second part is
> ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc).
>
> Adding both parts together, the total time spent by the algorithm can be
> represented as:
>
> ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) -
> n/16 * T(dup_alloc) - (T(ascend) + T(descend))
>
> Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)
> Let C2 = T(dup_alloc)
> Let C3 = T(ascend) + T(descend)
>
> Finally, the expression can be simplified as:
> ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3).
>
> This is a linear function, so the average time complexity is O(n).
>
> Signed-off-by: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx>
> ---
> include/linux/maple_tree.h | 3 +
> lib/maple_tree.c | 290 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 293 insertions(+)
>
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index f91dbc7fe091..a452dd8a1e5c 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -329,6 +329,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index,
> void *entry, gfp_t gfp);
> void *mtree_erase(struct maple_tree *mt, unsigned long index);
>
> +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp);
> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp);
> +
> void mtree_destroy(struct maple_tree *mt);
> void __mt_destroy(struct maple_tree *mt);
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index ca7039633844..6e0ad83f14e3 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -4,6 +4,10 @@
> * Copyright (c) 2018-2022 Oracle Corporation
> * Authors: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> * Matthew Wilcox <willy@xxxxxxxxxxxxx>
> + *
> + * Algorithm for duplicating Maple Tree
> + * Copyright (c) 2023 ByteDance
> + * Author: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx>
> */
>
> /*
> @@ -6475,6 +6479,292 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index)
> }
> EXPORT_SYMBOL(mtree_erase);
>
> +/*
> + * mas_dup_free() - Free an incomplete duplication of a tree.
> + * @mas: The maple state of a incomplete tree.
> + *
> + * The parameter @mas->node passed in indicates that the allocation failed on
> + * this node. This function frees all nodes starting from @mas->node in the
> + * reverse order of mas_dup_build(). There is no need to hold the source tree
> + * lock at this time.
> + */
> +static void mas_dup_free(struct ma_state *mas)
> +{
> + struct maple_node *node;
> + enum maple_type type;
> + void __rcu **slots;
> + unsigned char count, i;
> +
> + /* Maybe the first node allocation failed. */
> + if (mas_is_none(mas))
> + return;
> +
> + while (!mte_is_root(mas->node)) {
> + mas_ascend(mas);
> +

Please watch the extra whitespace. There are a few in this patch.

> + if (mas->offset) {
> + mas->offset--;
> + do {
> + mas_descend(mas);
> + mas->offset = mas_data_end(mas);
> + } while (!mte_is_leaf(mas->node));
> +
> + mas_ascend(mas);
> + }
> +
> + node = mte_to_node(mas->node);
> + type = mte_node_type(mas->node);
> + slots = ma_slots(node, type);
> + count = mas_data_end(mas) + 1;
> + for (i = 0; i < count; i++)
> + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK;
> +
> + mt_free_bulk(count, slots);
> + }
> +
> + node = mte_to_node(mas->node);
> + mt_free_one(node);
> +}
> +
> +/*
> + * mas_copy_node() - Copy a maple node and replace the parent.
> + * @mas: The maple state of source tree.
> + * @new_mas: The maple state of new tree.
> + * @parent: The parent of the new node.
> + *
> + * Copy @mas->node to @new_mas->node, set @parent to be the parent of
> + * @new_mas->node. If memory allocation fails, @mas is set to -ENOMEM.
> + */
> +static inline void mas_copy_node(struct ma_state *mas, struct ma_state *new_mas,
> + struct maple_pnode *parent)
> +{
> + struct maple_node *node = mte_to_node(mas->node);
> + struct maple_node *new_node = mte_to_node(new_mas->node);
> + unsigned long val;
> +
> + /* Copy the node completely. */
> + memcpy(new_node, node, sizeof(struct maple_node));
> +
> + /* Update the parent node pointer. */
> + val = (unsigned long)node->parent & MAPLE_NODE_MASK;
> + new_node->parent = ma_parent_ptr(val | (unsigned long)parent);
> +}
> +
> +/*
> + * mas_dup_alloc() - Allocate child nodes for a maple node.
> + * @mas: The maple state of source tree.
> + * @new_mas: The maple state of new tree.
> + * @gfp: The GFP_FLAGS to use for allocations.
> + *
> + * This function allocates child nodes for @new_mas->node during the duplication
> + * process. If memory allocation fails, @mas is set to -ENOMEM.
> + */
> +static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas,
> + gfp_t gfp)
> +{
> + struct maple_node *node = mte_to_node(mas->node);
> + struct maple_node *new_node = mte_to_node(new_mas->node);
> + enum maple_type type;
> + unsigned char request, count, i;
> + void __rcu **slots;
> + void __rcu **new_slots;
> + unsigned long val;
> +
> + /* Allocate memory for child nodes. */
> + type = mte_node_type(mas->node);
> + new_slots = ma_slots(new_node, type);
> + request = mas_data_end(mas) + 1;
> + count = mt_alloc_bulk(gfp, request, (void **)new_slots);
> + if (unlikely(count < request)) {
> + if (count)
> + mt_free_bulk(count, new_slots);

We were dropping this mt_free_bulk() call as discussed in [1]. Did I
miss something?

> +
> + memset(new_slots, 0, request * sizeof(void *));
> + mas_set_err(mas, -ENOMEM);
> + return;
> + }
> +
> + /* Restore node type information in slots. */
> + slots = ma_slots(node, type);
> + for (i = 0; i < count; i++) {
> + val = (unsigned long)mt_slot_locked(mas->tree, slots, i);
> + val &= MAPLE_NODE_MASK;
> + ((unsigned long *)new_slots)[i] |= val;
> + }
> +}
> +
> +/*
> + * mas_dup_build() - Build a new maple tree from a source tree
> + * @mas: The maple state of source tree, need to be in MAS_START state.
> + * @new_mas: The maple state of new tree, need to be in MAS_START state.
> + * @gfp: The GFP_FLAGS to use for allocations.
> + *
> + * This function builds a new tree in DFS preorder. If the memory allocation
> + * fails, the error code -ENOMEM will be set in @mas, and @new_mas points to the
> + * last node. mas_dup_free() will free the incomplete duplication of a tree.
> + *
> + * Note that the attributes of the two trees need to be exactly the same, and the
> + * new tree needs to be empty, otherwise -EINVAL will be set in @mas.
> + */
> +static inline void mas_dup_build(struct ma_state *mas, struct ma_state *new_mas,
> + gfp_t gfp)
> +{
> + struct maple_node *node;
> + struct maple_pnode *parent = NULL;
> + struct maple_enode *root;
> + enum maple_type type;
> +
> + if (unlikely(mt_attr(mas->tree) != mt_attr(new_mas->tree)) ||
> + unlikely(!mtree_empty(new_mas->tree))) {
> + mas_set_err(mas, -EINVAL);
> + return;
> + }
> +
> + mas_start(mas);
> + if (mas_is_ptr(mas) || mas_is_none(mas)) {
> + root = mt_root_locked(mas->tree);

mas_start(mas) would return the root entry if it's a pointer and NULL if
the tree is empty, so this can be written:
root = mas_start(mas);
if (mas_is_ptry() || mas_is_none()
goto set_new_tree;


> + goto set_new_tree;
> + }
> +
> + node = mt_alloc_one(gfp);
> + if (!node) {
> + new_mas->node = MAS_NONE;
> + mas_set_err(mas, -ENOMEM);
> + return;
> + }
> +
> + type = mte_node_type(mas->node);
> + root = mt_mk_node(node, type);
> + new_mas->node = root;
> + new_mas->min = 0;
> + new_mas->max = ULONG_MAX;
> + root = mte_mk_root(root);
> +
> + while (1) {
> + mas_copy_node(mas, new_mas, parent);
> +
> + if (!mte_is_leaf(mas->node)) {
> + /* Only allocate child nodes for non-leaf nodes. */
> + mas_dup_alloc(mas, new_mas, gfp);
> + if (unlikely(mas_is_err(mas)))
> + return;
> + } else {
> + /*
> + * This is the last leaf node and duplication is
> + * completed.
> + */
> + if (mas->max == ULONG_MAX)
> + goto done;
> +
> + /* This is not the last leaf node and needs to go up. */
> + do {
> + mas_ascend(mas);
> + mas_ascend(new_mas);
> + } while (mas->offset == mas_data_end(mas));
> +
> + /* Move to the next subtree. */
> + mas->offset++;
> + new_mas->offset++;
> + }
> +
> + mas_descend(mas);
> + parent = ma_parent_ptr(mte_to_node(new_mas->node));
> + mas_descend(new_mas);
> + mas->offset = 0;
> + new_mas->offset = 0;
> + }
> +done:
> + /* Specially handle the parent of the root node. */
> + mte_to_node(root)->parent = ma_parent_ptr(mas_tree_parent(new_mas));
> +set_new_tree:
> + /* Make them the same height */
> + new_mas->tree->ma_flags = mas->tree->ma_flags;
> + rcu_assign_pointer(new_mas->tree->ma_root, root);
> +}
> +
> +/**
> + * __mt_dup(): Duplicate an entire maple tree
> + * @mt: The source maple tree
> + * @new: The new maple tree
> + * @gfp: The GFP_FLAGS to use for allocations
> + *
> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order
> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate
> + * new child nodes in non-leaf nodes. The new node is exactly the same as the
> + * source node except for all the addresses stored in it. It will be faster than
> + * traversing all elements in the source tree and inserting them one by one into
> + * the new tree.
> + * The user needs to ensure that the attributes of the source tree and the new
> + * tree are the same, and the new tree needs to be an empty tree, otherwise
> + * -EINVAL will be returned.
> + * Note that the user needs to manually lock the source tree and the new tree.
> + *
> + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If
> + * the attributes of the two trees are different or the new tree is not an empty
> + * tree.
> + */
> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp)
> +{
> + int ret = 0;
> + MA_STATE(mas, mt, 0, 0);
> + MA_STATE(new_mas, new, 0, 0);
> +
> + mas_dup_build(&mas, &new_mas, gfp);
> +
> + if (unlikely(mas_is_err(&mas))) {
> + ret = xa_err(mas.node);
> + if (ret == -ENOMEM)
> + mas_dup_free(&new_mas);
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(__mt_dup);
> +
> +/**
> + * mtree_dup(): Duplicate an entire maple tree
> + * @mt: The source maple tree
> + * @new: The new maple tree
> + * @gfp: The GFP_FLAGS to use for allocations
> + *
> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order
> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate
> + * new child nodes in non-leaf nodes. The new node is exactly the same as the
> + * source node except for all the addresses stored in it. It will be faster than
> + * traversing all elements in the source tree and inserting them one by one into
> + * the new tree.
> + * The user needs to ensure that the attributes of the source tree and the new
> + * tree are the same, and the new tree needs to be an empty tree, otherwise
> + * -EINVAL will be returned.
> + *
> + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If
> + * the attributes of the two trees are different or the new tree is not an empty
> + * tree.
> + */
> +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp)
> +{
> + int ret = 0;
> + MA_STATE(mas, mt, 0, 0);
> + MA_STATE(new_mas, new, 0, 0);
> +
> + mas_lock(&new_mas);
> + mas_lock_nested(&mas, SINGLE_DEPTH_NESTING);
> +
> + mas_dup_build(&mas, &new_mas, gfp);
> + mas_unlock(&mas);
> +
> + if (unlikely(mas_is_err(&mas))) {
> + ret = xa_err(mas.node);
> + if (ret == -ENOMEM)
> + mas_dup_free(&new_mas);
> + }
> +
> + mas_unlock(&new_mas);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(mtree_dup);
> +
> /**
> * __mt_destroy() - Walk and free all nodes of a locked maple tree.
> * @mt: The maple tree
> --
> 2.20.1
>

[1]. https://lore.kernel.org/lkml/20231004142500.gz2552r74aiphl4z@revolver/

Thanks,
Liam