Re: [PATCH 2/2] mm: zswap.c: remove RB tree

From: Chris Li
Date: Fri Jan 19 2024 - 00:44:00 EST


Hi Yosry,

On Wed, Jan 17, 2024 at 10:35 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> > @@ -493,45 +471,47 @@ static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset)
> > static int zswap_insert(struct zswap_tree *tree, struct zswap_entry *entry,
> > struct zswap_entry **dupentry)
> > {
> > - struct rb_root *root = &tree->rbroot;
> > - struct rb_node **link = &root->rb_node, *parent = NULL;
> > - struct zswap_entry *myentry, *old;
> > - pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry);
> > -
> > -
> > - while (*link) {
> > - parent = *link;
> > - myentry = rb_entry(parent, struct zswap_entry, rbnode);
> > - myentry_offset = swp_offset(myentry->swpentry);
> > - if (myentry_offset > entry_offset)
> > - link = &(*link)->rb_left;
> > - else if (myentry_offset < entry_offset)
> > - link = &(*link)->rb_right;
> > - else {
> > - old = xa_load(&tree->xarray, entry_offset);
> > - BUG_ON(old != myentry);
> > - *dupentry = myentry;
> > + struct zswap_entry *e;
> > + pgoff_t offset = swp_offset(entry->swpentry);
> > + XA_STATE(xas, &tree->xarray, offset);
> > +
> > + do {
> > + xas_lock_irq(&xas);
> > + do {
> > + e = xas_load(&xas);
> > + if (xa_is_zero(e))
> > + e = NULL;
> > + } while (xas_retry(&xas, e));
> > + if (xas_valid(&xas) && e) {
> > + xas_unlock_irq(&xas);
> > + *dupentry = e;
> > return -EEXIST;
> > }
> > - }
> > - rb_link_node(&entry->rbnode, parent, link);
> > - rb_insert_color(&entry->rbnode, root);
> > - old = xa_store(&tree->xarray, entry_offset, entry, GFP_KERNEL);
> > - return 0;
> > + xas_store(&xas, entry);
> > + xas_unlock_irq(&xas);
> > + } while (xas_nomem(&xas, GFP_KERNEL));
> > + return xas_error(&xas);
>
> I think using the xas_* APIs can be avoided here. The only reason we
> need it is that we want to check if there's an existing entry first,
> and return -EEXIST. However, in that case, the caller will replace it
> anyway (and do some operations on the dupentry):

We might be able to for the insert case if we don't mind changing the
code behavior a bit. My original intent is to keep close to the
original zswap code and not stir the pot too much for the xarray
replacement. We can always make more adjustment once the RB tree is
gone.

> while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
> WARN_ON(1);
> zswap_duplicate_entry++;
> zswap_invalidate_entry(tree, dupentry);
> }
>
> So I think we can do something like this in zswap_insert() instead:
>
> dupentry = xa_store(..);
> if (WARN_ON(dupentry)) {
> zswap_duplicate_entry++;
> zswap_invalidate_entry(tree, dupentry);
> }
>
> WDYT?
>
> > }
> >
> > static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry)
> > {
> > + struct zswap_entry *e;
> > pgoff_t offset = swp_offset(entry->swpentry);
> > - if (!RB_EMPTY_NODE(&entry->rbnode)) {
> > - struct zswap_entry *old;
> > - old = xa_erase(&tree->xarray, offset);
> > - BUG_ON(old != entry);
> > - rb_erase(&entry->rbnode, &tree->rbroot);
> > - RB_CLEAR_NODE(&entry->rbnode);
> > - return true;
> > - }
> > - return false;
> > + XA_STATE(xas, &tree->xarray, offset);
> > +
> > + do {
> > + xas_lock_irq(&xas);
> > + do {
> > + e = xas_load(&xas);
> > + } while (xas_retry(&xas, e));
> > + if (xas_valid(&xas) && e != entry) {
> > + xas_unlock_irq(&xas);
> > + return false;
> > + }
> > + xas_store(&xas, NULL);
> > + xas_unlock_irq(&xas);
> > + } while (xas_nomem(&xas, GFP_KERNEL));
> > + return !xas_error(&xas);
> > }
>
> Same here, I think we just want:
>
> return !!xa_erase(..);

For the erase case it is tricky.
The current zswap code does not erase an entry if the tree entry at
the same offset has been changed. It should be fine if the new entry
is NULL. Basically some race to remove the entry already. However, if
the entry is not NULL, then force resetting it to NULL will change
behavior compared to the current.

>From reading the zswap code, I am not sure this is in theory
impossible to happen: Some one races to remove the zswap entry then
reclaim converts that page back to the zswap, with a new entry. By the
time the zswap_erase() tries to erase the current entry, the entry has
changed to a new entry. It seems the right thing to do in this
unlikely case is that we should skip the force erase and drop the
current entry, assuming someone else has already invalidated it. Don't
touch the entry in the tree, we assume it is a newer generation.

If we can reason the above is impossible to happen, then the force
erase and reset the entry to NULL should be fine(Famous last word).
Noted that this is a behavior change, I would like to seperate it out
from the drop in replacement patch(keep the original behavior)

Chris




>
> >
> > static struct zpool *zswap_find_zpool(struct zswap_entry *entry)
> > @@ -583,7 +563,6 @@ static void zswap_entry_put(struct zswap_tree *tree,
> >
> > WARN_ON_ONCE(refcount < 0);
> > if (refcount == 0) {
> > - WARN_ON_ONCE(!RB_EMPTY_NODE(&entry->rbnode));
> > zswap_free_entry(entry);
> > }
>
> nit: the braces are no longer needed here
>