Re: [RFC] Analyzing zpool allocators / Removing zbud and z3fold

From: Yosry Ahmed
Date: Thu Feb 22 2024 - 00:56:19 EST


On Thu, Feb 22, 2024 at 11:54:44AM +0800, Chengming Zhou wrote:
> On 2024/2/9 11:27, Yosry Ahmed wrote:
> > Hey folks,
> >
> > This is a follow up on my previously sent RFC patch to deprecate
> > z3fold [1]. This is an RFC without code, I thought I could get some
> > discussion going before writing (or rather deleting) more code. I went
> > back to do some analysis on the 3 zpool allocators: zbud, zsmalloc,
> > and z3fold.
>
> This is a great analysis! Sorry for being late to see it.
>
> I want to vote for this direction, zram has been using zsmalloc directly,
> zswap can also do this, which is simpler and we can just maintain and optimize
> only one allocator. The only evident downside is dependence on MMU, right?

AFAICT, yes. I saw a lot of positive responses when I sent an RFC to
mark z3fold as deprecated, but there were some opposing opinions as
well, which is why I did this simple analysis. I was hoping we can make
forward progress with that, but was disappointed it didn't get as much
attention as the deprecation RFC :)

>
> And I'm trying to optimize the scalability performance for zsmalloc now,
> which is bad so zswap has to use 32 pools to workaround it. (zram only use
> one pool, should also have the scalability problem on big server, maybe
> have to use many zram block devices to workaround it too.)

That's slightly orthogonal. Zsmalloc is not really showing worse
performance than other allocators, so this should be a separate effort.

>
> But too many pools would cause more memory waste and more fragmentation,
> so the resulted compression ratio is not good enough.
>
> As for the MMU dependence, we can actually avoid it? Maybe I missed something,
> we can get object's memory vecs from zsmalloc, then send it to decompress,
> which should support length(memory vecs) > 1?

IIUC the dependency on MMU is due to the use of kmalloc() APIs and the
fact that we may be using highmem pages. I think we may be able to work
around that dependency but I didn't look closely. Hopefully Minchan or
Sergey could shed more light on this.

>
> >
> > [1]https://lore.kernel.org/linux-mm/20240112193103.3798287-1-yosryahmed@xxxxxxxxxx/
> >
> > In this analysis, for each of the allocators I ran a kernel build test
> > on tmpfs in a limit cgroup 5 times and captured:
> > (a) The build times.
> > (b) zswap_load() and zswap_store() latencies using bpftrace.
> > (c) The maximum size of the zswap pool from /proc/meminfo::Zswapped.
>
> Here should use /proc/meminfo::Zswap, right?
> Zswap is the sum of pool pages size, Zswapped is the swapped/compressed pages.

Oh yes, it is /proc/meminfo::Zswap actually. I miswrote it in my email.

Thanks!