RE: [PATCH] zsmalloc: do not use bit_spin_lock

From: Song Bao Hua (Barry Song)
Date: Mon Dec 21 2020 - 18:08:47 EST




> -----Original Message-----
> From: Shakeel Butt [mailto:shakeelb@xxxxxxxxxx]
> Sent: Tuesday, December 22, 2020 11:46 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>; Minchan Kim <minchan@xxxxxxxxxx>;
> Mike Galbraith <efault@xxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>; linux-mm
> <linux-mm@xxxxxxxxx>; Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>;
> NitinGupta <ngupta@xxxxxxxxxx>; Sergey Senozhatsky
> <sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
> <akpm@xxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
>
> On Mon, Dec 21, 2020 at 1:30 PM Song Bao Hua (Barry Song)
> <song.bao.hua@xxxxxxxxxxxxx> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Shakeel Butt [mailto:shakeelb@xxxxxxxxxx]
> > > Sent: Tuesday, December 22, 2020 10:03 AM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> > > Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>; Minchan Kim
> <minchan@xxxxxxxxxx>;
> > > Mike Galbraith <efault@xxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>;
> linux-mm
> > > <linux-mm@xxxxxxxxx>; Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>;
> > > NitinGupta <ngupta@xxxxxxxxxx>; Sergey Senozhatsky
> > > <sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
> > > <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > >
> > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> > > <song.bao.hua@xxxxxxxxxxxxx> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Shakeel Butt [mailto:shakeelb@xxxxxxxxxx]
> > > > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > > > To: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> > > > > Cc: Minchan Kim <minchan@xxxxxxxxxx>; Mike Galbraith <efault@xxxxxx>;
> LKML
> > > > > <linux-kernel@xxxxxxxxxxxxxxx>; linux-mm <linux-mm@xxxxxxxxx>; Song
> Bao
> > > Hua
> > > > > (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; Sebastian Andrzej Siewior
> > > > > <bigeasy@xxxxxxxxxxxxx>; NitinGupta <ngupta@xxxxxxxxxx>; Sergey
> > > Senozhatsky
> > > > > <sergey.senozhatsky.work@xxxxxxxxx>; Andrew Morton
> > > > > <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > >
> > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> > > wrote:
> > > > > >
> > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote:
> > > > > > > > zsmalloc takes bit spinlock in its _map() callback and releases
> it
> > > > > > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > > > > > about scheduling in atomic context.
> > > > > > > >
> > > > > > > > To fix that and to improve RT properties of zsmalloc, remove that
> > > > > > > > bit spinlock completely and use a bit flag instead.
> > > > > > >
> > > > > > > I don't want to use such open code for the lock.
> > > > > > >
> > > > > > > I see from Mike's patch, recent zswap change introduced the lockdep
> > > > > > > splat bug and you want to improve zsmalloc to fix the zswap bug
> and
> > > > > > > introduce this patch with allowing preemption enabling.
> > > > > >
> > > > > > This understanding is upside down. The code in zswap you are referring
> > > > > > to is not buggy. You may claim that it is suboptimal but there is
> > > > > > nothing wrong in taking a mutex.
> > > > > >
> > > > >
> > > > > Is this suboptimal for all or just the hardware accelerators? Sorry,
> I
> > > > > am not very familiar with the crypto API. If I select lzo or lz4 as
> a
> > > > > zswap compressor will the [de]compression be async or sync?
> > > >
> > > > Right now, in crypto subsystem, new drivers are required to write based
> on
> > > > async APIs. The old sync API can't work in new accelerator drivers as
> they
> > > > are not supported at all.
> > > >
> > > > Old drivers are used to sync, but they've got async wrappers to support
> async
> > > > APIs. Eg.
> > > > crypto: acomp - add support for lz4 via scomp
> > > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> > > >
> > > > crypto: acomp - add support for lzo via scomp
> > > >
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> > > >
> > > > so they are supporting async APIs but they are still working in sync mode
> > > as
> > > > those old drivers don't sleep.
> > > >
> > >
> > > Good to know that those are sync because I want them to be sync.
> > > Please note that zswap is a cache in front of a real swap and the load
> > > operation is latency sensitive as it comes in the page fault path and
> > > directly impacts the applications. I doubt decompressing synchronously
> > > a 4k page on a cpu will be costlier than asynchronously decompressing
> > > the same page from hardware accelerators.
> >
> > If you read the old paper:
> >
> https://www.ibm.com/support/pages/new-linux-zswap-compression-functionalit
> y
> > Because the hardware accelerator speeds up compression, looking at the zswap
> > metrics we observed that there were more store and load requests in a given
> > amount of time, which filled up the zswap pool faster than a software
> > compression run. Because of this behavior, we set the max_pool_percent
> > parameter to 30 for the hardware compression runs - this means that zswap
> > can use up to 30% of the 10GB of total memory.
> >
> > So using hardware accelerators, we get a chance to speed up compression
> > while decreasing cpu utilization.
> >
>
> I don't care much about the compression. It's the decompression or
> more specifically the latency of decompression I really care about.
>
> Compression happens on reclaim, so latency is not really an issue.
> Reclaim can be pressure-based or proactive. I think async batched
> compression by accelerators makes a lot of sense. Though I doubt zswap
> is the right layer for that. To me adding "async batched compression
> support by accelerators" in zram looks more natural as the kernel
> already has async block I/O support.

Yep.
zram is one of the targets I have thought about to support acomp.

>
> For decompression, I would like as low latency as possible which I
> think is only possible by doing decompression on a cpu synchronously.

One possibility is that we change HW accelerator driver to be sync
polling for decompression. But this still depends on async api as
this is the framework nowadays, the difference would be the driver
won't really block. crypto_wait_req() will return without actual
sleep.

Thanks
Barry