[PATCH v3 0/2] mm/zswap: optimize for dynamic zswap_pools

From: Chengming Zhou
Date: Fri Feb 16 2024 - 03:55:52 EST


Changes in v3:
- Improve the commit messages and comments, per Yosry.
- Use percpu_ref_is_zero() for debug purpose, per Yosry.
- Collect tag.
- Link to v2: https://lore.kernel.org/r/20240210-zswap-global-lru-v2-0-fbee3b11a62e@xxxxxxxxxxxxx

Changes in v2:
- fix build error when !CONFIG_MEMCG_KMEM.
- make zswap struct static and fix some error paths, per Yosry.
- add another shrink_lock to protect zswap.next_shrink, per Yosry.
- keep "WARN_ON(percpu_ref_tryget(&pool->ref))" in pool release path
for debug, per Nhat.
- improve the commit messages.
- Link to v1: https://lore.kernel.org/r/20240210-zswap-global-lru-v1-0-853473d7b0da@xxxxxxxxxxxxx

Dynamic pool creation has been supported for a long time, which maybe
not used so much in practice. But with the per-memcg lru merged, the
current structure of zswap_pool's lru and shrinker become less optimal.

In the current structure, each zswap_pool has its own lru, shrinker and
shrink_work, but only the latest zswap_pool will be the current used.

1. When memory has pressure, all shrinkers of zswap_pools will try to
shrink its lru list, there is no order between them.

2. When zswap limit hit, only the last zswap_pool's shrink_work will
try to shrink its own lru, which is inefficient.

A more natural way is to have a global zswap lru shared between all
zswap_pools, and so is the shrinker. The code becomes much simpler too.

Another optimization is changing zswap_pool kref to percpu_ref, which
will be taken reference by every zswap entry. So the scalability is
better.

Testing kernel build (32 threads) in tmpfs with memory.max=2GB.
(zswap shrinker and writeback enabled with one 50GB swapfile,
on a 128 CPUs x86-64 machine, below is the average of 5 runs)

mm-unstable zswap-global-lru
real 63.20 63.12
user 1061.75 1062.95
sys 268.74 264.44

Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
---
Chengming Zhou (2):
mm/zswap: global lru and shrinker shared by all zswap_pools
mm/zswap: change zswap_pool kref to percpu_ref

mm/zswap.c | 207 +++++++++++++++++++++++++++----------------------------------
1 file changed, 93 insertions(+), 114 deletions(-)
---
base-commit: 191d97734e41a5c9f90a2f6636fdd335ae1d435d
change-id: 20240210-zswap-global-lru-94d49316178b

Best regards,
--
Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>