Re: [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}()

From: Peng Zhang
Date: Mon Jul 31 2023 - 05:54:30 EST




在 2023/7/26 22:58, Liam R. Howlett 写道:
* Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx> [230726 04:10]:
Introduce ma_nonleaf_data_end{_nocheck}() to get the data end of
non-leaf nodes without knowing the maximum value of nodes, so that any
ascending can be avoided even if the maximum value of nodes is not known.

The principle is that we introduce MAPLE_ENODE to mark an ENODE, which
cannot be used by metadata, so we can distinguish whether it is ENODE or
metadata.

The nocheck version is to avoid lockdep complaining in some scenarios
where no locks are held.

Signed-off-by: Peng Zhang <zhangpeng.00@xxxxxxxxxxxxx>
---
lib/maple_tree.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index a3d602cfd030..98e4fdf6f4b9 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -310,12 +310,19 @@ static inline void mte_set_node_dead(struct maple_enode *mn)
#define MAPLE_ENODE_TYPE_SHIFT 0x03
/* Bit 2 means a NULL somewhere below */
#define MAPLE_ENODE_NULL 0x04
+/* Bit 7 means this is an ENODE, instead of metadata */
+#define MAPLE_ENODE 0x80

We were saving this bit for more node types. I don't want to use this
bit for this reason since you could have done BFS to duplicate the tree
using the existing way to find the node end.
We have reserved 4 bits for the node type. I don't think there will be
more than 16 node types going forward.

Even the DFS that has been implemented can use the existing way to get
the data end. I didn't use it because when walking up the tree, we don't
know the maximum value of the node, and the continuous upward walk will
introduce more overhead, which is what mas_ascend() does. Doing BFS
cannot avoid this problem also.

The reason I don't do BFS is that it has more overhead than DFS. If you
think of a tree as a graph, doing DFS will only walk each edge twice.
What if it is BFS? Since we can't use queues, we can only emulate BFS,
which additionally does something like mas_next_node() does, which
introduces more overhead than DFS. Considering only the layer of leaf
nodes, it needs to walk each edge twice. So the overhead of doing BFS is
more than DFS.

Bits are highly valuable and this is the only remaining bit. I had
thought about using this in Feb 2021 to see if there was metadata or
not, but figured a way around it (using the max trick) and thus saved
this bit for potential expansion of node types.
I thought of another way to get the maximum value of a node without
doing any extra upward walk. When doing DFS, we can use a stack to save
the maximum value of ancestor nodes. The stack size can be set to
MAPLE_HEIGHT_MAX. In this way, this bit can be reserved, and there is no
need to do a loop like mas_ascend() in order to get the maximum value.

+
+static inline bool slot_is_mte(unsigned long slot)
+{
+ return slot & MAPLE_ENODE;
+}
static inline struct maple_enode *mt_mk_node(const struct maple_node *node,
enum maple_type type)
{
- return (void *)((unsigned long)node |
- (type << MAPLE_ENODE_TYPE_SHIFT) | MAPLE_ENODE_NULL);
+ return (void *)((unsigned long)node | (type << MAPLE_ENODE_TYPE_SHIFT) |
+ MAPLE_ENODE_NULL | MAPLE_ENODE);
}
static inline void *mte_mk_root(const struct maple_enode *node)
@@ -1411,6 +1418,65 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
return NULL;
}
+/*
+ * ma_nonleaf_data_end() - Find the end of the data in a non-leaf node.
+ * @mt: The maple tree
+ * @node: The maple node
+ * @type: The maple node type
+ *
+ * Uses metadata to find the end of the data when possible without knowing the
+ * node maximum.
+ *
+ * Return: The zero indexed last slot with child.
+ */
+static inline unsigned char ma_nonleaf_data_end(struct maple_tree *mt,
+ struct maple_node *node,
+ enum maple_type type)
+{
+ void __rcu **slots;
+ unsigned long slot;
+
+ slots = ma_slots(node, type);
+ slot = (unsigned long)mt_slot(mt, slots, mt_pivots[type]);
+ if (unlikely(slot_is_mte(slot)))
+ return mt_pivots[type];
+
+ return ma_meta_end(node, type);
+}
+
+/*
+ * ma_nonleaf_data_end_nocheck() - Find the end of the data in a non-leaf node.
+ * @node: The maple node
+ * @type: The maple node type
+ *
+ * Uses metadata to find the end of the data when possible without knowing the
+ * node maximum. This is the version of ma_nonleaf_data_end() that does not
+ * check for lock held. This particular version is designed to avoid lockdep
+ * complaining in some scenarios.
+ *
+ * Return: The zero indexed last slot with child.
+ */
+static inline unsigned char ma_nonleaf_data_end_nocheck(struct maple_node *node,
+ enum maple_type type)
+{
+ void __rcu **slots;
+ unsigned long slot;
+
+ slots = ma_slots(node, type);
+ slot = (unsigned long)rcu_dereference_raw(slots[mt_pivots[type]]);
+ if (unlikely(slot_is_mte(slot)))
+ return mt_pivots[type];
+
+ return ma_meta_end(node, type);
+}
+
+/* See ma_nonleaf_data_end() */
+static inline unsigned char mte_nonleaf_data_end(struct maple_tree *mt,
+ struct maple_enode *enode)
+{
+ return ma_nonleaf_data_end(mt, mte_to_node(enode), mte_node_type(enode));
+}
+
/*
* ma_data_end() - Find the end of the data in a node.
* @node: The maple node
--
2.20.1