Re: [PATCH 0/2] minimize swapping on zswap store failure

From: Nhat Pham
Date: Tue Oct 17 2023 - 15:04:52 EST


On Tue, Oct 17, 2023 at 12:03 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Mon, Oct 16, 2023 at 5:58 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> >
> > On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> > >
> > > Currently, when a zswap store attempt fails, the page is immediately
> > > swapped out. This could happen for a variety of reasons. For instance,
> > > the compression algorithm could fail (such as when the data is not
> > > compressible), or the backend allocator might not be able to find a
> > > suitable slot for the compressed page. If these pages are needed
> > > later on, users will incur IOs from swapins.
> > >
> > > This issue prevents the adoption of zswap for potential users who
> > > cannot tolerate the latency associated with swapping. In many cases,
> > > these IOs are avoidable if we just keep in memory the pages that zswap
> > > fail to store.
> > >
> > > This patch series add two new features for zswap that will alleviate
> > > the risk of swapping:
> > >
> > > a) When a store attempt fail, keep the page untouched in memory
> > > instead of swapping it out.
> >
> > What about writeback when the zswap limit is hit? I understand the
> > problem, but I am wondering if this is the correct way of fixing it.
> > We really need to make zswap work without a backing swapfile, which I
> > think is the correct way to fix all these problems. I was working on
> > that, but unfortunately I had to pivot to something else before I had
> > something that was working.
> >
> > At Google, we have "ghost" swapfiles that we use just to use zswap
> > without a swapfile. They are sparse files, and we have internal kernel
> > patches to flag them and never try to actually write to them.
> >
> > I am not sure how many bandaids we can afford before doing the right
> > thing. I understand it's a much larger surgery, perhaps there is a way
> > to get a short-term fix that is also a step towards the final state we
> > want to reach instead?
>
> Regarding the writeback - I'll make sure to also short-circuit writeback
> when the bypass_swap option is enabled in v2 :) I'll probably send out
> v2 after

... I gather all these feedbacks.

> I absolutely agree that we must decouple zswap and swap (and would
> be happy to help out in any capacity I could - we have heard similar
> concerns/complaints about swap wastage from internal parties as well).
>
> However, as Johannes has pointed out, this feature still has its place,
> given our already existing swapfile deployments. I do agree that a
> global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that
> we can enable/disable this feature on a per-workload basis.
>
> >
> > >
> > > b) If the store attempt fails at the compression step, allow the page
> > > to be stored in its uncompressed form in the zswap pool. This maintains
> > > the LRU ordering of pages, which will be helpful for accurate
> > > memory reclaim (zswap writeback in particular).
> >
> > This is dangerous. Johannes and I discussed this before. This means
> > that reclaim can end up allocating more memory instead of freeing.
> > Allocations made in the reclaim path are made under the assumption
> > that we will eventually free memory. In this case, we won't. In the
> > worst case scenario, reclaim can leave the system/memcg in a worse
> > state than before it started.
> >
> > Perhaps there is a way we can do this without allocating a zswap entry?
> >
> > I thought before about having a special list_head that allows us to
> > use the lower bits of the pointers as markers, similar to the xarray.
> > The markers can be used to place different objects on the same list.
> > We can have a list that is a mixture of struct page and struct
> > zswap_entry. I never pursued this idea, and I am sure someone will
> > scream at me for suggesting it. Maybe there is a less convoluted way
> > to keep the LRU ordering intact without allocating memory on the
> > reclaim path.
>
> Hmm yeah you're right about these concerns. That seems like a lot more
> involved than what I envisioned initially.
>
> Let's put this aside for now. I'll just send the first patch in v2, and we can
> work on + discuss more about uncompressed pages storing later on.
>
> >
> > >
> > > These features could be enabled independently via two new zswap module
> > > parameters.
> > >
> > > Nhat Pham (2):
> > > swap: allows swap bypassing on zswap store failure
> > > zswap: store uncompressed pages when compression algorithm fails
> > >
> > > Documentation/admin-guide/mm/zswap.rst | 16 +++++++
> > > include/linux/zswap.h | 9 ++++
> > > mm/page_io.c | 6 +++
> > > mm/shmem.c | 8 +++-
> > > mm/zswap.c | 64 +++++++++++++++++++++++---
> > > 5 files changed, 95 insertions(+), 8 deletions(-)
> > >
> > > --
> > > 2.34.1