Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

From: Yosry Ahmed
Date: Tue Nov 14 2023 - 12:16:45 EST


+Ying

On Mon, Nov 13, 2023 at 5:06 AM Zhongkun He
<hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
>
> I recently found two scenarios where zswap entry could not be
> released, which will cause shrink_worker and active recycling
> to fail.
> 1)The swap entry has been freed, but cached in swap_slots_cache,
> no swap cache and swapcount=0.
> 2)When the option zswap_exclusive_loads_enabled disabled and
> zswap_load completed(page in swap_cache and swapcount = 0).

For case (1), I think a cleaner solution would be to move the
zswap_invalidate() call from swap_range_free() (which is called after
the cached slots are freed) to __swap_entry_free_locked() if the usage
goes to 0. I actually think conceptually this makes not just for
zswap_invalidate(), but also for the arch call, memcg uncharging, etc.
Slots caching is a swapfile optimization that should be internal to
swapfile code. Once a swap entry is freed (i.e. swap count is 0 AND
not in the swap cache), all the hooks should be called (memcg, zswap,
arch, ..) as the swap entry is effectively freed. The fact that
swapfile code internally batches and caches slots should be
transparent to other parts of MM. I am not sure if the calls can just
be moved or if there are underlying assumptions in the implementation
that would be broken, but it feels like the right thing to do.

For case (2), I don't think zswap can just decide to free the entry.

In that case, the page is in the swap cache pointing to a swp_entry
which has a corresponding zswap entry, and the page is clean because
it is already in swap/zswap, so we don't need to write it out again
unless it is redirtied. If zswap just drops the entry, and reclaim
tries to reclaim the page in the swap cache, it will drop the page
assuming that there is a copy in swap/zswap (because it is clean). Now
we lost all copies of the page.

Am I missing something?

>
> The above two cases need to be determined by swapcount=0,
> fix it.
>
> Signed-off-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
> ---
> mm/zswap.c | 35 +++++++++++++++++++++++++----------
> 1 file changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 74411dfdad92..db95491bcdd5 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1063,11 +1063,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> struct mempolicy *mpol;
> struct scatterlist input, output;
> struct crypto_acomp_ctx *acomp_ctx;
> + struct swap_info_struct *si;
> struct zpool *pool = zswap_find_zpool(entry);
> bool page_was_allocated;
> u8 *src, *tmp = NULL;
> unsigned int dlen;
> - int ret;
> + int ret = 0;
> struct writeback_control wbc = {
> .sync_mode = WB_SYNC_NONE,
> };
> @@ -1082,16 +1083,30 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> mpol = get_task_policy(current);
> page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
> NO_INTERLEAVE_INDEX, &page_was_allocated);
> - if (!page) {
> + if (!page)
> ret = -ENOMEM;
> - goto fail;
> - }
> -
> - /* Found an existing page, we raced with load/swapin */
> - if (!page_was_allocated) {
> + else if (!page_was_allocated) {
> + /* Found an existing page, we raced with load/swapin */
> put_page(page);
> ret = -EEXIST;
> - goto fail;
> + }
> +
> + if (ret) {
> + si = get_swap_device(swpentry);
> + if (!si)
> + goto out;
> +
> + /* Two cases to directly release zswap_entry.
> + * 1) -ENOMEM,if the swpentry has been freed, but cached in
> + * swap_slots_cache(no page and swapcount = 0).
> + * 2) -EEXIST, option zswap_exclusive_loads_enabled disabled and
> + * zswap_load completed(page in swap_cache and swapcount = 0).
> + */
> + if (!swap_swapcount(si, swpentry))
> + ret = 0;
> +
> + put_swap_device(si);
> + goto out;
> }
>
> /*
> @@ -1106,7 +1121,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> spin_unlock(&tree->lock);
> delete_from_swap_cache(page_folio(page));
> ret = -ENOMEM;
> - goto fail;
> + goto out;
> }
> spin_unlock(&tree->lock);
>
> @@ -1151,7 +1166,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>
> return ret;
>
> -fail:
> +out:
> if (!zpool_can_sleep_mapped(pool))
> kfree(tmp);
>
> --
> 2.25.1
>