Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup

From: Yosry Ahmed
Date: Fri Jun 16 2023 - 04:41:06 EST


On Fri, Jun 16, 2023 at 1:37 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 16.06.23 10:04, Yosry Ahmed wrote:
> > On Fri, Jun 16, 2023 at 12:57 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 16.06.23 09:37, Yosry Ahmed wrote:
> >>> On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>>> Thanks Fabian for tagging me.
> >>>>>
> >>>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are
> >>>>> a few parts that I do not understand -- hopefully you can help me out
> >>>>> here:
> >>>>>
> >>>>> (1) If I understand correctly in this patch we set the active memcg
> >>>>> trying to charge any pages allocated in a zspage to the current memcg,
> >>>>> yet that zspage will contain multiple compressed object slots, not
> >>>>> just the one used by this memcg. Aren't we overcharging the memcg?
> >>>>> Basically the first memcg that happens to allocate the zspage will pay
> >>>>> for all the objects in this zspage, even after it stops using the
> >>>>> zspage completely?
> >>>>
> >>>> It will not overcharge. As you said below, we are not using
> >>>> __GFP_ACCOUNT and charging the compressed slots to the memcgs.
> >>>>
> >>>>>
> >>>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs,
> >>>>> yet this patch is trying to charge the entire zspage. Aren't we double
> >>>>> charging the zspage? I am guessing this isn't happening because (as
> >>>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
> >>>>> this patch may be NOP, and the actual charging is coming from patch 3
> >>>>> only.
> >>>>
> >>>> YES, the actual charging is coming from patch 3. This patch just
> >>>> delivers the BIO page's memcg to the current task which is not the
> >>>> consumer.
> >>>>
> >>>>>
> >>>>> (3) Zswap recently implemented per-memcg charging of compressed
> >>>>> objects in a much simpler way. If your main interest is #2 (which is
> >>>>> what I understand from the commit log), it seems like zswap might be
> >>>>> providing this already? Why can't you use zswap? Is it the fact that
> >>>>> zswap requires a backing swapfile?
> >>>>
> >>>> Thanks for your reply and review. Yes, the zswap requires a backing
> >>>> swapfile. The I/O path is very complex, sometimes it will throttle the
> >>>> whole system if some resources are short , so we hope to use zram.
> >>>
> >>> Is the only problem with zswap for you the requirement of a backing swapfile?
> >>>
> >>> If yes, I am in the early stages of developing a solution to make
> >>> zswap work without a backing swapfile. This was discussed in LSF/MM
> >>> [1]. Would this make zswap usable in for your use case?
> >>
> >> Out of curiosity, are there any other known pros/cons when using
> >> zswap-without-swap instead of zram?
> >>
> >> I know that zram requires sizing (size of the virtual block device) and
> >> consumes metadata, zswap doesn't.
> >
> > We don't use zram in our data centers so I am not an expert about
> > zram, but off the top of my head there are a few more advantages to
> > zswap:
>
> Thanks!
>
> > (1) Better memcg support (which this series is attempting to address
> > in zram, although in a much more complicated way).
>
> Right. I think this patch also misses to update apply the charging in the recompress
> case. (only triggered by user space IIUC)
>
> >
> > (2) We internally have incompressible memory handling on top of zswap,
> > which is something that we would like to upstream when
> > zswap-without-swap is supported. Basically if a page does not compress
> > well enough to save memory we reject it from zswap and make it
> > unevictable (if there is no backing swapfile). The existence of zswap
> > in the MM layer helps with this. Since zram is a block device from the
> > MM perspective, it's more difficult to do something like this.
> > Incompressible pages just sit in zram AFAICT.
>
> I see. With ZRAM_HUGE we still have to store the uncompressed page
> (because, it's a block device and has to hold that data).

Right.

>
> >
> > (3) Writeback support. If you're running out of memory to store
> > compressed pages you can add a swapfile in runtime and zswap will
> > start writing to it freeing up space to compress more pages. This
> > wouldn't be possible in the same way in zram. Zram supports writing to
> > a backing device but in a more manual way (userspace has to write to
> > an interface to tell zram to write some pages).
>
> Right, that zram backing device stuff is really sub-optimal and only useful
> in corner cases (most probably not datacenters).
>
> What one can do with zram is to add a second swap device with lower priority.
> Looking at my Fedora machine:
>
> $ cat /proc/swaps
> Filename Type Size Used Priority
> /dev/dm-2 partition 16588796 0 -2
> /dev/zram0 partition 8388604 0 100
>
>
> Guess the difference here is that you won't be writing out the compressed
> data to the disk, but anything the gets swapped out afterwards will
> end up on the disk. I can see how the zswap behavior might be better in that case
> (instead of swapping out some additional pages you relocate the
> already-swapped-out-to-zswap pages to the disk).

Yeah I am hoping we can enable the use of zswap without a backing
swapfile, and I keep seeing use cases that would benefit from that.

>
> --
> Cheers,
>
> David / dhildenb
>