Re: [PATCHv4 0/9] zsmalloc/zram: configurable zspage size

From: Sergey Senozhatsky
Date: Sun Nov 13 2022 - 22:54:09 EST


Hi Minchan,

On (22/11/11 09:03), Minchan Kim wrote:
> > Sorry, not sure I'm following. So you want a .config value
> > for zspage limit? I really like the sysfs knob, because then
> > one may set values on per-device basis (if they have multiple
> > zram devices in a system with different data patterns):
>
> Yes, I wanted to have just a global policy to drive zsmalloc smarter
> without needing user's big effort to decide right tune value(I thought
> the decision process would be quite painful for normal user who don't
> have enough resources) since zsmalloc's design makes it possible.
> But for the interim solution until we prove no regression, just
> provide config and then remove the config later when we add aggressive
> zpage compaction(if necessary, please see below) since it's easier to
> deprecate syfs knob.

[..]

> I understand what you want to achieve with per-pool config with exposing
> the knob to user but my worry is still how user could decide best fit
> since workload is so dynamic. Some groups have enough resouces to practice
> under fleet experimental while many others don't so if we really need the
> per-pool config step, at least, I'd like to provide default guide to user
> in the documentation along with the tunable knobs for experimental.
> Maybe, we can suggest 4 for swap case and 8 for fs case.
>
> I don't disagree the sysfs knobs for use cases but can't we deal with the
> issue better way?

[..]

> with *aggressive zpage compaction*. Now, we are relying on shrinker
> (it might be already enough) to trigger but we could change the policy
> wasted memory in the class size crossed a threshold we defind for zram fs
> usecase since it would be used without memory pressure.
>
> What do you think about?

This is tricky. I didn't want us to come up with any sort of policies
based on assumptions. For instance, we know that SUSE uses zram with fs
under severe memory pressure (so severe that they immediately noticed
when we removed zsmalloc handle allocation slow path and reported a
regression), so assumption that fs zram use-case is not memory sensitive
does not always hold.

There are too many variables. We have different data patterns, yes, but
even same data patterns have different characteristics when compressed
with different algorithms; then we also have different host states
(memory pressure, etc.) and so on.

I think that it'll be safer for us to execute it the other way.
We can (that's what I was going to do) reach out to people (Android,
SUSE, Meta, ChromeOS, Google cloud, WebOS, Tizen) and ask them to run
experiments (try out various numbers). Then (several months later) we
can take a look at the data - what numbers work for which workloads,
and then we can introduce/change policies, based on evidence and real
use cases. Who knows, maybe zspage_chain_size of 6 can be the new
default and then we can add .config policy, maybe 7 or 8. Or maybe we
won't find a single number that works equally well for everyone (even
in similar use cases).

This is where sysfs knob is very useful. Unlike .config, which has no
flexibility especially when your entire fleet uses same .config for all
builds, sysfs knob lets people run numerous A/B tests simultaneously
(not to mention that some setups have many zram devices which can have
different zspage_chain_size-s). And we don't even need to deprecate it,
if we introduce a generic one like allocator_tunables, which will
support tuples `key=val`. Then we can just deprecate a specific `key`.