Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff()

From: Yosry Ahmed
Date: Mon Jan 22 2024 - 15:40:02 EST


On Mon, Jan 22, 2024 at 12:19 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Sat, Jan 20, 2024 at 02:40:07AM +0000, Yosry Ahmed wrote:
> > During swapoff, try_to_unuse() makes sure that zswap_invalidate() is
> > called for all swap entries before zswap_swapoff() is called. This means
> > that all zswap entries should already be removed from the tree. Simplify
> > zswap_swapoff() by removing the tree cleanup loop, and leaving an
> > assertion in its place.
> >
> > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
>
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>
> That's a great simplification.
>
> Removing the tree->lock made me double take, but at this point the
> swapfile and its cache should be fully dead and I don't see how any of
> the zswap operations that take tree->lock could race at this point.

It took me a while staring at the code to realize this loop is pointless.

However, while I have your attention on the swapoff path, there's a
slightly irrelevant problem that I think might be there, but I am not
sure.

It looks to me like swapoff can race with writeback, and there may be
a chance of UAF for the zswap tree. For example, if zswap_swapoff()
races with shrink_memcg_cb(), I feel like we may free the tree as it
is being used. For example if zswap_swapoff()->kfree(tree) happen
right before shrink_memcg_cb()->list_lru_isolate(l, item).

Please tell me that I am being paranoid and that there is some
protection against zswap writeback racing with swapoff. It feels like
we are very careful with zswap entries refcounting, but not with the
zswap tree itself.

>
> > ---
> > Chengming, Chris, I think this should make the tree split and the xarray
> > conversion patches simpler (especially the former). If others agree,
> > both changes can be rebased on top of this.
>
> The resulting code is definitely simpler, but this patch is not a
> completely trivial cleanup, either. If you put it before Chengming's
> patch and it breaks something, it would be difficult to pull out
> without affecting the tree split.

Are you suggesting I rebase this on top of Chengming's patches? I can
definitely do this, but the patch will be slightly less
straightforward, and if the tree split patches break something it
would be difficult to pull out as well. If you feel like this patch is
more likely to break things, I can rebase.