Re: [PATCH v5] zswap: memcontrol: implement zswap writeback disabling

From: Nhat Pham
Date: Sun Nov 19 2023 - 14:13:09 EST


On Sun, Nov 19, 2023 at 1:39 AM Chris Li <chrisl@xxxxxxxxxx> wrote:
>
> On Sat, Nov 18, 2023 at 11:23 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> >
> > Hmm how about this - in the future, we support the following
> > options:
> >
> > 1. zswap.writeback == 1: no limitation to zswap writeback.
> > All backing swap devices (sorted by priorities?) are fair game.
> >
> > 2. zswap.writeback == 0: disable all forms of zswap writeback.
> >
> > 3. zswap.writeback == <tiers description>: attempt to write to each
> > tier, one at a time.
>
> We can merge the zswap.writeback as it is for now to unblock you.
>
> For the future. I think we should remove zswap.writeback completely.

I'm a bit weary about API changes, especially changes that affect
backward compatibility. Breaking existing userspace programs simply
with a kernel upgrade does not sound very nice to me.

(although I've heard that the eventual plan is to deprecate cgroupv1
- not sure how that is gonna proceed).

Hence my attempt at creating something that can both serve the
current use case, while still remaining (fairly) extensible for future
ideas.

>
> Instead we have:
>
> swap.tiers == <swap_tier_list_name>
> swap.tiers == "all" all available swap tiers. "zswap + swap file".
> This is the default.
> swap.tiers == "zswap" zswap only, no other swap file. Internally set
> zswap.writeback = 0
> swap.tiers == "foo" foo is a list of swap devices it can use. You can
> define your town custom swap tier list in
> swap.tiers == "none" or "disabled" Not allowed to swap.

swap.tiers == "none" or "disabled" means disallowing zswap as
well, correct?

>
> "all", "zswap", "none" are reserved keywords.
> "foo", "bar" etc are custom lists of swap tiers. User define custom
> tier list in sys/kernel/mm/swap/tiers:
> ssd:zswap,/dev/nvme01p4
> hdd:/dev/sda4,/dev/sdb4

I don't have any major argument against this. It just seems a bit
heavyweight for what we need at the moment (only disabling
swap-to-disk usage).

I'll let other people weigh in about this of course.
Johannes, how do you feel about this proposed API?

>
> That would define two custom tiers. "ssd" can use zswap then /dev/nvme01p4.
> The exact name of the "swap.tiers" and tiers name are open to suggestions.
>
> >
> > The first two are basically what we have for this patch.
> > The last one will be added in a future patch.
> >
> > This is from the userspace perspective. Internally, we can modify
> > memcg->writeback to be a pointer or a struct instead of this bool.
> > (as you suggested).
>
> Internally I would suggest memcg->swaptiers, the write back name is
> somewhat confusing. As your patch indicated. It has two situation:
> 1. shrinking from zpool to real swapfile. The write back is appropriate here.
> 2. zswap store failed (compression ratio too low, out of memory etc).
> The write back is confusing here. It is more like writing through or
> skip.
>
> >
> > This way, the API remains intact and backward compatible
> > (and FWIW, I think there are still a lot of values in having simple
> > options for the users who have simple memory hierarchies).
>
> swap.tiers can be simple. For example, you can modify your patch to
> "swap.tires == zswap" to
> set zswap.writeback bool to 0 for now. Most of your patch is still re-usable.
>

I'm less concerned about internals - that is always up to changes.
I'm a bit more concerned with the API we're exposing to the users.

> I think we should discuss if we want to keep zswap.writeback in the
> future because that would be some code undeletable and functionally
> overlap with swap.tiers

This is a fair point.

>
> Chris