Re: [RFC PATCH v2 6/7] mm: zswap: simplify writeback function

From: Domenico Cerasuolo
Date: Fri Jun 09 2023 - 07:06:29 EST


On Thu, Jun 8, 2023 at 6:48 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> Hi Domenico,
>
> On Tue, Jun 06, 2023 at 04:56:10PM +0200, Domenico Cerasuolo wrote:
> > Previously, in zswap, the writeback function was passed to zpool drivers
> > for their usage in calling the writeback operation. However, since the
> > drivers did not possess references to entries and the function was
> > specifically designed to work with handles, the writeback process has
> > been modified to occur directly within zswap.
>
> I'm having trouble parsing this sentence.
>
> > Consequently, this change allows for some simplification of the
> > writeback function, taking into account the updated workflow.
>
> How about:
>
> zswap_writeback_entry() used to be a callback for the backends, which
> don't know about struct zswap_entry.
>
> Now that the only user is the generic zswap LRU reclaimer, it can be
> simplified: pass the pinned zswap_entry directly, and consolidate the
> refcount management in the shrink function.

Sounds clearer, will update.

>
> > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@xxxxxxxxx>
> > ---
> > mm/zswap.c | 69 ++++++++++++++----------------------------------------
> > 1 file changed, 17 insertions(+), 52 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 2831bf56b168..ef8604812352 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -250,7 +250,8 @@ static bool zswap_has_pool;
> > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \
> > zpool_get_type((p)->zpool))
> >
> > -static int zswap_writeback_entry(struct zpool *pool, unsigned long handle);
> > +static int zswap_writeback_entry(struct zswap_entry *entry, struct zswap_header *zhdr,
> > + struct zswap_tree *tree);
> > static int zswap_pool_get(struct zswap_pool *pool);
> > static void zswap_pool_put(struct zswap_pool *pool);
> >
> > @@ -632,15 +633,21 @@ static int zswap_shrink(struct zswap_pool *pool)
> > }
> > spin_unlock(&tree->lock);
> >
> > - ret = zswap_writeback_entry(pool->zpool, lru_entry->handle);
> > + ret = zswap_writeback_entry(lru_entry, zhdr, tree);
> >
> > spin_lock(&tree->lock);
> > if (ret) {
> > spin_lock(&pool->lru_lock);
> > list_move(&lru_entry->lru, &pool->lru);
> > spin_unlock(&pool->lru_lock);
>
> This could use a comment.
>
> /* Writeback failed, put entry back on LRU */
>
> > + zswap_entry_put(tree, tree_entry);
>
> This put is a common factor in both branches, so you can consolidate.
>
> > + } else {
> > + /* free the local reference */
> > + zswap_entry_put(tree, tree_entry);
> > + /* free the entry if it's not been invalidated*/
>
> Missing space between text and */
>
> > + if (lru_entry == zswap_rb_search(&tree->rbroot, swpoffset))
> > + zswap_entry_put(tree, tree_entry);
> > }
> > - zswap_entry_put(tree, tree_entry);
> > spin_unlock(&tree->lock);
>
> The success path freeing (hopefully the common path) is now
> unfortunately hidden in fairly deep indentation. I think this would be
> better with a goto for the error case.
>
> All together, something like this?
>
> if (ret) {
> /* Writeback failed, put entry back on LRU */
> ...
> goto put_unlock;
> }
>
> /*
> * Writeback started successfully, the page now belongs to the
> * swapcache. Drop the base ref from the tree - unless invalidate
> * already took it out while we had the tree->lock released for IO.
> */
> if (lru_entry == zswap_rb_search(&tree->rb_root, swpoffset))
> zswap_entry_put(tree, entry);
>
> put_unlock:
> /* Drop local reference */
> zswap_entry_put(tree, tree_entry);
> spin_unlock(&tree->lock);
>
> return ret ? -EAGAIN : 0;
>

This feedback overlaps with the on in patch 1/7, I'm integrating them
so that in patch #1, the invalidation check is done only with rb search
and a first `goto unlock` for error.
Then here the base reference-drop is added after the `ret` check, and
errors go to `put_unlock`:

if (ret) {
/* Writeback failed, put entry back on LRU */
spin_lock(&pool->lru_lock);
list_move(&entry->lru, &pool->lru);
spin_unlock(&pool->lru_lock);
goto put_unlock;
}

/* Check for invalidate() race */
if (entry != zswap_rb_search(&tree->rbroot, swpoffset))
goto put_unlock;

/* Drop base reference */
zswap_entry_put(tree, entry);

put_unlock:
/* Drop local reference */
zswap_entry_put(tree, entry);
unlock:
spin_unlock(&tree->lock);
return ret ? -EAGAIN : 0;

> Btw, it's unsettling that we drop the tree reference without
> explicitly removing the entry for the tree. We rely on the final put
> that frees the entry to do tree removal, but this doesn't happen if
> somebody else is holding a reference; the entry can remain in the tree
> long after we've officially handed over ownership of the data to
> swapcache. TTBOMK there are currently no bugs because of that, but
> it's a data corruption waiting to happen.
>
> This should be:
>
> /*
> * Writeback started successfully, the page now belongs to the
> * swapcache. Drop the entry from zswap - unless invalidate already
> * took it out while we had the tree->lock released for IO.
> */
> if (entry == zswap_rb_search(&tree->rb_root, swpoffset))
> zswap_invalidate_entry(tree, entry);
>
> Would you care to send a follow-up patch?

Makes total sense, thanks will send a patch for this.