Re: [PATCH] mm: zswap: fix potential memory corruption on duplicate store

From: Johannes Weiner
Date: Mon Sep 25 2023 - 08:25:22 EST


On Mon, Sep 25, 2023 at 10:36:06AM +0200, Domenico Cerasuolo wrote:
> On Fri, Sep 22, 2023 at 7:42 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> >
> > On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote:
> > > While stress-testing zswap a memory corruption was happening when writing
> > > back pages. __frontswap_store used to check for duplicate entries before
> > > attempting to store a page in zswap, this was because if the store fails
> > > the old entry isn't removed from the tree. This change removes duplicate
> > > entries in zswap_store before the actual attempt.
> > >
> > > Based on commit ce9ecca0238b ("Linux 6.6-rc2")
> > >
> > > Fixes: 42c06a0e8ebe ("mm: kill frontswap")
> > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@xxxxxxxxx>
> >
> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> >
> > > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio)
> > > if (!zswap_enabled || !tree)
> > > return false;
> > >
> > > + /*
> > > + * If this is a duplicate, it must be removed before attempting to store
> > > + * it, otherwise, if the store fails the old page won't be removed from
> > > + * the tree, and it might be written back overriding the new data.
> > > + */
> > > + spin_lock(&tree->lock);
> > > + dupentry = zswap_rb_search(&tree->rbroot, offset);
> > > + if (dupentry) {
> > > + zswap_duplicate_entry++;
> > > + zswap_invalidate_entry(tree, dupentry);
> > > + }
> > > + spin_unlock(&tree->lock);
> >
> > Do we still need the dupe handling at the end of the function then?
> >
> > The dupe store happens because a page that's already in swapcache has
> > changed and we're trying to swap_writepage() it again with new data.
> >
> > But the page is locked at this point, pinning the swap entry. So even
> > after the tree lock is dropped I don't see how *another* store to the
> > tree at this offset could occur while we're compressing.
>
> My reasoning here was that frontswap used to invalidate a dupe right before
> calling store(), so I thought that the check at the end of zswap_store() was
> actually needed.

Ok the git history is actually really enlightening of how this came to
be. Initially, frontswap didn't invalidate. Only zswap did. Then
someone ran into exactly the scenario that you encountered:

commit fb993fa1a2f669215fa03a09eed7848f2663e336
Author: Weijie Yang <weijie.yang@xxxxxxxxxxx>
Date: Tue Dec 2 15:59:25 2014 -0800

mm: frontswap: invalidate expired data on a dup-store failure

If a frontswap dup-store failed, it should invalidate the expired page
in the backend, or it could trigger some data corruption issue.
Such as:
1. use zswap as the frontswap backend with writeback feature
2. store a swap page(version_1) to entry A, success
3. dup-store a newer page(version_2) to the same entry A, fail
4. use __swap_writepage() write version_2 page to swapfile, success
5. zswap do shrink, writeback version_1 page to swapfile
6. version_2 page is overwrited by version_1, data corrupt.

This patch fixes this issue by invalidating expired data immediately
when meet a dup-store failure.

This split the invalidation duty: On store success, zswap would
invalidate. On store failure, frontswap would.

Then this patch moved the invalidate in frontswap to before the store:

commit d1dc6f1bcf1e998e7ce65fc120da371ab047a999
Author: Dan Streetman <ddstreet@xxxxxxxx>
Date: Wed Jun 24 16:58:18 2015 -0700

frontswap: allow multiple backends

at which point the after-store invalidate in zswap became unnecessary.

> Since we acquire the tree lock anyway to add the new entry to the LRU, wouldn't
> checking the result of zswap_rb_insert be a very cheap sanity check? We could
> treat it as an error and fail the store(), or just add a warning and still
> invalidate the dupe?

Yeah, it's less about performance and more about code clarity.

A warning probably makes the most sense. It gives us the sanity check
and an opportunity to document expectations wrt the swapcache, while
keeping the code resilient against data corruption.