RE: [PATCH] zsmalloc: do not use bit_spin_lock

From: Song Bao Hua (Barry Song)
Date: Mon Dec 21 2020 - 16:30:49 EST




> -----Original Message-----
> From: Shakeel Butt [mailto:shakeelb@xxxxxxxxxx]
> Sent: Tuesday, December 22, 2020 10:03 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>; Minchan Kim <minchan@xxxxxxxxxx>;
> Mike Galbraith <efault@xxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; linux-mm
> <linux-mm@xxxxxxxxx>; Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>;
> NitinGupta <ngupta@xxxxxxxxxx>; Sergey Senozhatsky
> <sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
>
> On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> <song.bao.hua@xxxxxxxxxxxxx> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Shakeel Butt [mailto:shakeelb@xxxxxxxxxx]
> > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > To: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> > > Cc: Minchan Kim <minchan@xxxxxxxxxx>; Mike Galbraith <efault@xxxxxx>; LKML
> > > <linux-kernel@xxxxxxxxxxxxxxx>; linux-mm <linux-mm@xxxxxxxxx>; Song Bao
> Hua
> > > (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; Sebastian Andrzej Siewior
> > > <bigeasy@xxxxxxxxxxxxx>; NitinGupta <ngupta@xxxxxxxxxx>; Sergey
> Senozhatsky
> > > <sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
> > > <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > >
> > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> wrote:
> > > >
> > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote:
> > > > > > zsmalloc takes bit spinlock in its _map() callback and releases it
> > > > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > > > about scheduling in atomic context.
> > > > > >
> > > > > > To fix that and to improve RT properties of zsmalloc, remove that
> > > > > > bit spinlock completely and use a bit flag instead.
> > > > >
> > > > > I don't want to use such open code for the lock.
> > > > >
> > > > > I see from Mike's patch, recent zswap change introduced the lockdep
> > > > > splat bug and you want to improve zsmalloc to fix the zswap bug and
> > > > > introduce this patch with allowing preemption enabling.
> > > >
> > > > This understanding is upside down. The code in zswap you are referring
> > > > to is not buggy. You may claim that it is suboptimal but there is
> > > > nothing wrong in taking a mutex.
> > > >
> > >
> > > Is this suboptimal for all or just the hardware accelerators? Sorry, I
> > > am not very familiar with the crypto API. If I select lzo or lz4 as a
> > > zswap compressor will the [de]compression be async or sync?
> >
> > Right now, in crypto subsystem, new drivers are required to write based on
> > async APIs. The old sync API can't work in new accelerator drivers as they
> > are not supported at all.
> >
> > Old drivers are used to sync, but they've got async wrappers to support async
> > APIs. Eg.
> > crypto: acomp - add support for lz4 via scomp
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> >
> > crypto: acomp - add support for lzo via scomp
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> >
> > so they are supporting async APIs but they are still working in sync mode
> as
> > those old drivers don't sleep.
> >
>
> Good to know that those are sync because I want them to be sync.
> Please note that zswap is a cache in front of a real swap and the load
> operation is latency sensitive as it comes in the page fault path and
> directly impacts the applications. I doubt decompressing synchronously
> a 4k page on a cpu will be costlier than asynchronously decompressing
> the same page from hardware accelerators.

If you read the old paper:
https://www.ibm.com/support/pages/new-linux-zswap-compression-functionality
Because the hardware accelerator speeds up compression, looking at the zswap
metrics we observed that there were more store and load requests in a given
amount of time, which filled up the zswap pool faster than a software
compression run. Because of this behavior, we set the max_pool_percent
parameter to 30 for the hardware compression runs - this means that zswap
can use up to 30% of the 10GB of total memory.

So using hardware accelerators, we get a chance to speed up compression
while decreasing cpu utilization.

BTW, If it is not easy to change zsmalloc, one quick workaround we might do
in zswap is adding the below after applying Mike's original patch:

if(in_atomic()) /* for zsmalloc */
while(!try_wait_for_completion(&req->done);
else /* for zbud, z3fold */
crypto_wait_req(....);

crypto_wait_req() is actually doing wait_for_completion():
static inline int crypto_wait_req(int err, struct crypto_wait *wait)
{
switch (err) {
case -EINPROGRESS:
case -EBUSY:
wait_for_completion(&wait->completion);
reinit_completion(&wait->completion);
err = wait->err;
break;
}

return err;
}

Thanks
Barry