Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

From: Yosry Ahmed
Date: Thu Nov 16 2023 - 15:19:22 EST


On Thu, Nov 16, 2023 at 12:12 PM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> Hi Yosry,
>
> On Tue, Nov 14, 2023 at 9:16 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > > 1)The swap entry has been freed, but cached in swap_slots_cache,
> > > no swap cache and swapcount=0.
> > > 2)When the option zswap_exclusive_loads_enabled disabled and
> > > zswap_load completed(page in swap_cache and swapcount = 0).
> >
> > For case (1), I think a cleaner solution would be to move the
> > zswap_invalidate() call from swap_range_free() (which is called after
> > the cached slots are freed) to __swap_entry_free_locked() if the usage
> > goes to 0. I actually think conceptually this makes not just for
> > zswap_invalidate(), but also for the arch call, memcg uncharging, etc.
> > Slots caching is a swapfile optimization that should be internal to
> > swapfile code. Once a swap entry is freed (i.e. swap count is 0 AND
>
> Do you mean moving all swap slots free to bypass the swap slot cache, even it
> is not from zswap? That might have unwanted side effects. The swap
> slot cache is not just for swap files on disk. The batching has the
> effect that on average lower cost of freeing per entry.

Not bypassing the swap slot cache, just make the callbacks to
invalidate the zswap entry, do memg uncharging, etc when the slot is
no longer used and is entering the swap slot cache (i.e. when
free_swap_slot() is called), instead of when draining the swap slot
cache (i.e. when swap_range_free() is called). For all parts of MM
outside of swap, the swap entry is freed when free_swap_slot() is
called. We don't free it immediately because of caching, but this
should be transparent to other parts of MM (e.g. zswap, memcg, etc).

>
> > not in the swap cache), all the hooks should be called (memcg, zswap,
> > arch, ..) as the swap entry is effectively freed. The fact that
> > swapfile code internally batches and caches slots should be
> > transparent to other parts of MM. I am not sure if the calls can just
> > be moved or if there are underlying assumptions in the implementation
> > that would be broken, but it feels like the right thing to do.
>
> There is also the behavior that if the page gets swapped in but hasn't
> changed, when swap out again, it is possible to avoid writing the
> page again to the disk. For disk there is no overhead keeping the old
> date on the disk not to touch it. For zpool it might have memory
> overhead holding the compressed pool. The trade off might be
> different.
>
> Chris