Re: [PATCH v2 2/6] rust: rbtree: add red-black tree implementation backed by the C version

From: Benno Lossin
Date: Thu Mar 14 2024 - 10:20:06 EST


On 2/19/24 12:48, Matt Gilbride wrote:
> +impl<K, V> RBTree<K, V>
> +where
> + K: Ord,
> +{
> + /// Tries to insert a new value into the tree.
> + ///
> + /// It overwrites a node if one already exists with the same key and returns it (containing the
> + /// key/value pair). Returns [`None`] if a node with the same key didn't already exist.
> + ///
> + /// Returns an error if it cannot allocate memory for the new node.
> + pub fn try_create_and_insert(&mut self, key: K, value: V) -> Result<Option<RBTreeNode<K, V>>> {
> + Ok(self.insert(Self::try_allocate_node(key, value)?))
> + }
> +
> + /// Inserts a new node into the tree.
> + ///
> + /// It overwrites a node if one already exists with the same key and returns it (containing the
> + /// key/value pair). Returns [`None`] if a node with the same key didn't already exist.
> + ///
> + /// This function always succeeds.
> + pub fn insert(&mut self, node: RBTreeNode<K, V>) -> Option<RBTreeNode<K, V>> {
> + let RBTreeNode { node } = node;

You can do this pattern deconstruction directly in the function
signature.

> + let node = Box::into_raw(node);
> + // SAFETY: `node` is valid at least until we call `Box::from_raw`, which only happens when
> + // the node is removed or replaced.
> + let node_links = unsafe { addr_of_mut!((*node).links) };
> + let mut new_link: &mut *mut bindings::rb_node = &mut self.rootrb_node;

Is the only reason for this being a double pointer that `rb_link_node`
requires that as its last argument? If yes, then I would just add a
`&mut` there and give this the simpler type.

Also a more fitting name IMO would be `cur_node` or similar.

> + let mut parent = core::ptr::null_mut();

I would suggest naming this `prev_node` or similar.

> + while !new_link.is_null() {
> + // SAFETY: All links fields we create are in a `Node<K, V>`.

Suggestion:
// SAFETY: By the type invariant of `Self`, all non-null `rb_node` pointers stored in `self`
// point to the links field of `Node<K, V>` objects.

> + let this = unsafe { crate::container_of!(*new_link, Node<K, V>, links) };

The `container_of` macro is used 3 times, I think it's nicer to import
it.

> +
> + parent = *new_link;
> +
> + // SAFETY: `this` is a non-null node so it is valid by the type invariants. `node` is
> + // valid until the node is removed.
> + match unsafe { (*node).key.cmp(&(*this).key) } {
> + // SAFETY: `parent` is a non-null node so it is valid by the type invariants.
> + Ordering::Less => new_link = unsafe { &mut (*parent)rb_left },

I would use `*new_link` instead of `parent` here.

> + // SAFETY: `parent` is a non-null node so it is valid by the type invariants.
> + Ordering::Greater => new_link = unsafe { &mut (*parent).rb_right },

Ditto.

> + Ordering::Equal => {
> + // INVARIANT: We are replacing an existing node with a new one, which is valid.
> + // It remains valid because we "forgot" it with `Box::into_raw`.
> + // SAFETY: All pointers are non-null and valid (parent, despite the name, really
> + // is the node we're replacing).
> + unsafe { bindings::rb_replace_node(parent, node_links, &mut self.root) };

If you use `*new_link` instead of `parent` (and perform the rename) then
you don't need the note in the parenthesis in the safety comment.

> +
> + // INVARIANT: The node is being returned and the caller may free it, however,
> + // it was removed from the tree. So the invariants still hold.
> + return Some(RBTreeNode {
> + // SAFETY: `this` was a node in the tree, so it is valid.
> + node: unsafe { Box::from_raw(this as _) },

Why do you need this cast? Can it be replaced by `.cast()` or
`.cast_mut()`?

> + });
> + }
> + }
> + }
> +
> + // INVARIANT: We are linking in a new node, which is valid. It remains valid because we
> + // "forgot" it with `Box::into_raw`.
> + // SAFETY: All pointers are non-null and valid (`*new_link` is null, but `new_link` is a
> + // mutable reference).
> + unsafe { bindings::rb_link_node(node_links, parent, new_link) };
> +
> + // SAFETY: All pointers are valid. `node` has just been inserted into the tree.
> + unsafe { bindings::rb_insert_color(node_links, &mut self.root) };
> + None
> + }
> +
> + /// Returns a node with the given key, if one exists.
> + fn find(&self, key: &K) -> Option<NonNull<Node<K, V>>> {
> + let mut node = self.root.rb_node;
> + while !node.is_null() {
> + // SAFETY: All links fields we create are in a `Node<K, V>`.

See above suggestion.

> + let this = unsafe { crate::container_of!(node, Node<K, V>, links) };
> + // SAFETY: `this` is a non-null node so it is valid by the type invariants.
> + node = match key.cmp(unsafe { &(*this).key }) {
> + // SAFETY: `node` is a non-null node so it is valid by the type invariants.
> + Ordering::Less => unsafe { (*node).rb_left },
> + // SAFETY: `node` is a non-null node so it is valid by the type invariants.
> + Ordering::Greater => unsafe { (*node).rb_right },
> + Ordering::Equal => return NonNull::new(this as _),

Why do you need this cast? Can it be replaced by `.cast()` or
`.cast_mut()`?

> + }
> + }

If you modify this function to return the parent node if `key` were in
the tree, then you could use this in `insert` instead of having to write
two loops.

> + None
> + }
> +
> + /// Returns a reference to the value corresponding to the key.
> + pub fn get(&self, key: &K) -> Option<&V> {
> + // SAFETY: The `find` return value is a node in the tree, so it is valid.
> + self.find(key).map(|node| unsafe { &node.as_ref().value })
> + }
> +
> + /// Returns a mutable reference to the value corresponding to the key.
> + pub fn get_mut(&mut self, key: &K) -> Option<&mut V> {
> + // SAFETY: The `find` return value is a node in the tree, so it is valid.
> + self.find(key)
> + .map(|mut node| unsafe { &mut node.as_mut().value })
> + }
> +
> + /// Removes the node with the given key from the tree.
> + ///
> + /// It returns the node that was removed if one exists, or [`None`] otherwise.
> + fn remove_node(&mut self, key: &K) -> Option<RBTreeNode<K, V>> {
> + let mut node = self.find(key)?;
> +
> + // SAFETY: The `find` return value is a node in the tree, so it is valid.
> + unsafe { bindings::rb_erase(&mut node.as_mut().links, &mut self.root) };
> +
> + // INVARIANT: The node is being returned and the caller may free it, however, it was
> + // removed from the tree. So the invariants still hold.
> + Some(RBTreeNode {
> + // SAFETY: The `find` return value was a node in the tree, so it is valid.
> + node: unsafe { Box::from_raw(node.as_ptr()) },
> + })
> + }
> +
> + /// Removes the node with the given key from the tree.
> + ///
> + /// It returns the value that was removed if one exists, or [`None`] otherwise.
> + pub fn remove(&mut self, key: &K) -> Option<V> {
> + let node = self.remove_node(key)?;
> + let RBTreeNode { node } = node;
> + let Node {
> + links: _,
> + key: _,
> + value,
> + } = *node;
> + Some(value)

This could be a one-liner:
self.remove_node(key).map(|node| node.node.value)

> + }
> +}
> +
> +impl<K, V> Default for RBTree<K, V> {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +impl<K, V> Drop for RBTree<K, V> {
> + fn drop(&mut self) {
> + // SAFETY: `root` is valid as it's embedded in `self` and we have a valid `self`.
> + let mut next = unsafe { bindings::rb_first_postorder(&self.root) };
> +
> + // INVARIANT: The loop invariant is that all tree nodes from `next` in postorder are valid.
> + while !next.is_null() {
> + // SAFETY: All links fields we create are in a `Node<K, V>`.
> + let this = unsafe { crate::container_of!(next, Node<K, V>, links) };
> +
> + // Find out what the next node is before disposing of the current one.
> + // SAFETY: `next` and all nodes in postorder are still valid.
> + next = unsafe { bindings::rb_next_postorder(next) };
> +
> + // INVARIANT: This is the destructor, so we break the type invariant during clean-up,
> + // but it is not observable. The loop invariant is still maintained.
> + // SAFETY: `this` is valid per the loop invariant.
> + unsafe { drop(Box::from_raw(this as *mut Node<K, V>)) };

Use `.cast_mut()` instead of `as ...`.

> + }
> + }
> +}
> +
> +/// A memory reservation for a red-black tree node.
> +///
> +/// It contains the memory needed to hold a node that can be inserted into a red-black tree. One
> +/// can be obtained by directly allocating it ([`RBTree::try_reserve_node`]).
> +pub struct RBTreeNodeReservation<K, V> {
> + node: Box<MaybeUninit<Node<K, V>>>,
> +}
> +
> +// SAFETY: An [`RBTree`] allows the same kinds of access to its values that a struct allows to its
> +// fields, so we use the same Send condition as would be used for a struct with K and V fields.
> +unsafe impl<K: Send, V: Send> Send for RBTreeNodeReservation<K, V> {}
> +
> +// SAFETY: An [`RBTree`] allows the same kinds of access to its values that a struct allows to its
> +// fields, so we use the same Sync condition as would be used for a struct with K and V fields.
> +unsafe impl<K: Sync, V: Sync> Sync for RBTreeNodeReservation<K, V> {}

The two safety comments are copy-pasted. `RBTreeNodeReservation` does not
allow any kind of access to its values.

> +
> +impl<K, V> RBTreeNodeReservation<K, V> {
> + /// Initialises a node reservation.
> + ///
> + /// It then becomes an [`RBTreeNode`] that can be inserted into a tree.
> + pub fn into_node(mut self, key: K, value: V) -> RBTreeNode<K, V> {
> + let node_ptr = self.node.as_mut_ptr();
> + // SAFETY: `node_ptr` is valid, and so are its fields.
> + unsafe { addr_of_mut!((*node_ptr).links).write(bindings::rb_node::default()) };
> + // SAFETY: `node_ptr` is valid, and so are its fields.
> + unsafe { addr_of_mut!((*node_ptr).key).write(key) };
> + // SAFETY: `node_ptr` is valid, and so are its fields.
> + unsafe { addr_of_mut!((*node_ptr).value).write(value) };
> + let raw = Box::into_raw(self.node);
> + RBTreeNode {
> + // SAFETY: The pointer came from a `MaybeUninit<Node>` whose fields have all been
> + // initialised. Additionally, it has the same layout as `Node`.
> + node: unsafe { Box::from_raw(raw as _) },
> + }

Instead of doing `into_raw` and `from_raw`, I would use
`Box::assume_init` (it is unstable via `new_uninit`).

> + }
> +}
> +
> +/// A red-black tree node.
> +///
> +/// The node is fully initialised (with key and value) and can be inserted into a tree without any
> +/// extra allocations or failure paths.
> +pub struct RBTreeNode<K, V> {
> + node: Box<Node<K, V>>,
> +}
> +
> +// SAFETY: An [`RBTree`] allows the same kinds of access to its values that a struct allows to its
> +// fields, so we use the same Send condition as would be used for a struct with K and V fields.
> +unsafe impl<K: Send, V: Send> Send for RBTreeNode<K, V> {}
> +
> +// SAFETY: An [`RBTree`] allows the same kinds of access to its values that a struct allows to its
> +// fields, so we use the same Sync condition as would be used for a struct with K and V fields.
> +unsafe impl<K: Sync, V: Sync> Sync for RBTreeNode<K, V> {}

The two safety comments are copy-pasted. `RBTreeNode` does not
allow any kind of access to its values.

--
Cheers,
Benno

>
> --
> 2.44.0.rc0.258.g7320e95886-goog
>