Re: [PATCH v5] zswap: memcontrol: implement zswap writeback disabling

From: Chris Li
Date: Sun Nov 19 2023 - 16:51:32 EST


On Sun, Nov 19, 2023 at 11:08 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Sun, Nov 19, 2023 at 1:39 AM Chris Li <chrisl@xxxxxxxxxx> wrote:
> >
> > On Sat, Nov 18, 2023 at 11:23 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> > >
> > > Hmm how about this - in the future, we support the following
> > > options:
> > >
> > > 1. zswap.writeback == 1: no limitation to zswap writeback.
> > > All backing swap devices (sorted by priorities?) are fair game.
> > >
> > > 2. zswap.writeback == 0: disable all forms of zswap writeback.
> > >
> > > 3. zswap.writeback == <tiers description>: attempt to write to each
> > > tier, one at a time.
> >
> > We can merge the zswap.writeback as it is for now to unblock you.
> >
> > For the future. I think we should remove zswap.writeback completely.
>
> I'm a bit weary about API changes, especially changes that affect
> backward compatibility. Breaking existing userspace programs simply
> with a kernel upgrade does not sound very nice to me.
>
> (although I've heard that the eventual plan is to deprecate cgroupv1
> - not sure how that is gonna proceed).
>
> Hence my attempt at creating something that can both serve the
> current use case, while still remaining (fairly) extensible for future
> ideas.

With that reasoning, the "swap.tiers" would serve better than "zswap.writeback",
do you think so?

>
> >
> > Instead we have:
> >
> > swap.tiers == <swap_tier_list_name>
> > swap.tiers == "all" all available swap tiers. "zswap + swap file".
> > This is the default.
> > swap.tiers == "zswap" zswap only, no other swap file. Internally set
> > zswap.writeback = 0
> > swap.tiers == "foo" foo is a list of swap devices it can use. You can
> > define your town custom swap tier list in
> > swap.tiers == "none" or "disabled" Not allowed to swap.
>
> swap.tiers == "none" or "disabled" means disallowing zswap as
> well, correct?

Correct, no swap at all.

>
> >
> > "all", "zswap", "none" are reserved keywords.
> > "foo", "bar" etc are custom lists of swap tiers. User define custom
> > tier list in sys/kernel/mm/swap/tiers:
> > ssd:zswap,/dev/nvme01p4
> > hdd:/dev/sda4,/dev/sdb4
>
> I don't have any major argument against this. It just seems a bit
> heavyweight for what we need at the moment (only disabling
> swap-to-disk usage).

The first milestone we just implement the reserved keywords without
the custom swap tier list.
That should be very similar to "zswap.writeback". Instead of writing 0
to "zswap.writeback".
You write "zswap" to "swap.tiers". Writing "none" will disable all
swap. Writing "all" will allow all swap devices.
I consider this conceptually cleaner than the "zswap.writeback" == 0
will also disable other swap types behavior. "disabled zswap writeback
== disable all swap" feels less natural.

> I'll let other people weigh in about this of course.
> Johannes, how do you feel about this proposed API?
>
> >
> > That would define two custom tiers. "ssd" can use zswap then /dev/nvme01p4.
> > The exact name of the "swap.tiers" and tiers name are open to suggestions.
> >
> > >
> > > The first two are basically what we have for this patch.
> > > The last one will be added in a future patch.
> > >
> > > This is from the userspace perspective. Internally, we can modify
> > > memcg->writeback to be a pointer or a struct instead of this bool.
> > > (as you suggested).
> >
> > Internally I would suggest memcg->swaptiers, the write back name is
> > somewhat confusing. As your patch indicated. It has two situation:
> > 1. shrinking from zpool to real swapfile. The write back is appropriate here.
> > 2. zswap store failed (compression ratio too low, out of memory etc).
> > The write back is confusing here. It is more like writing through or
> > skip.
> >
> > >
> > > This way, the API remains intact and backward compatible
> > > (and FWIW, I think there are still a lot of values in having simple
> > > options for the users who have simple memory hierarchies).
> >
> > swap.tiers can be simple. For example, you can modify your patch to
> > "swap.tires == zswap" to
> > set zswap.writeback bool to 0 for now. Most of your patch is still re-usable.
> >
>
> I'm less concerned about internals - that is always up to changes.
> I'm a bit more concerned with the API we're exposing to the users.

Me too. I think we are in agreement here. That is why I think
"swap.tiers" is more general.

>
> > I think we should discuss if we want to keep zswap.writeback in the
> > future because that would be some code undeletable and functionally
> > overlap with swap.tiers
>
> This is a fair point.

If you think we have the risk of not being able to obsolete
"zswap.writeback", then it would be much better not to introduce
"zswap.writeback" in the first place. Just have a minimal
implementation of "swap.tiers" instead. I believe from the code
complexity point of view, the complexity should be very similar.

Chris