Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry

From: Zhongkun He
Date: Wed Jan 03 2024 - 09:13:21 EST


> That's around 2.7% increase in real time, no? Admittedly, this
> micro-benchmark is too small to conclude either way, but the data
> doesn't seem to be in your favor.
>
> I'm a bit concerned about the overhead here, given that with this
> patch we will drain the per-cpu batch on every written-back entry.
> That's quite a high frequency, especially since we're moving towards
> more writeback (either with the new zswap shrinker, or your time
> threshold-based writeback mechanism). For instance, there seems to be
> some (local/per-cpu) locking involved, no? Could there be some form of
> lock contentions there (especially since with the new shrinker, you
> can have a lot of concurrent writeback contexts?)
>
> Furthermore, note that a writeback from zswap to swap saves less
> memory than a writeback from memory to swap, so the effect of the
> extra overhead will be even more pronounced. That is, the amount of
> extra work done (from this change) to save one unit of memory would be
> even larger than if we call lru_add_drain() every time we swap out a
> page (from memory -> swap). And I'm pretty sure we don't call
> lru_add_drain() every time we swap out a page - I believe we call
> lru_add_drain() every time we perform a shrink action. For e.g, in
> shrink_inactive_list(). That's much coarser in granularity than here.
>
> Also, IIUC, the more often we perform lru_add_drain(), the less
> batching effect we will obtain. IOW, the overhead of maintaining the
> batch will become higher relative to the performance gains from
> batching.
>
> Maybe I'm missing something - could you walk me through how
> lru_add_drain() is fine here, from this POV? Thanks!
>
> >
> > After writeback, we perform the following steps to release the memory again
> > echo 1g > memory.reclaim
> >
> > Base:
> > total used recalim total used
> > Mem: 38Gi 2.5Gi ----> 38Gi 1.5Gi
> > Swap: 5.0Gi 1.0Gi ----> 5Gi 1.5Gi
> > used memory -1G swap +0.5g
> > It means that half of the pages are failed to move to the tail of lru list,
> > So we need to release an additional 0.5Gi anon pages to swap space.
> >
> > With this patch:
> > total used recalim total used
> > Mem: 38Gi 2.6Gi ----> 38Gi 1.6Gi
> > Swap: 5.0Gi 1.0Gi ----> 5Gi 1Gi
> >
> > used memory -1Gi, swap +0Gi
> > It means that we release all the pages which have been add to the tail of
> > lru list in zswap_writeback_entry() and folio_rotate_reclaimable().
> >
>
> OTOH, this suggests that we're onto something. Swap usage seems to
> decrease quite a bit. Sounds like a real problem that this patch is
> tackling.
> (Please add this benchmark result to future changelog. It'll help
> demonstrate the problem).

Yes

>
> I'm inclined to ack this patch, but it'd be nice if you can assuage my
> concerns above (with some justification and/or larger benchmark).
>

OK,thanks.

> (Or perhaps, we have to drain, but less frequently/higher up the stack?)
>

I've reviewed the code again and have no idea. It would be better if
you have any suggestions.

New test:
This patch will add the execution of folio_rotate_reclaimable(not executed
without this patch) and lru_add_drain,including percpu lock competition.
I bind a new task to allocate memory and use the same batch lock to compete
with the target process, on the same CPU.
context:
1:stress --vm 1 --vm-bytes 1g (bind to cpu0)
2:stress --vm 1 --vm-bytes 5g --vm-hang 0(bind to cpu0)
3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0.

Average time of five tests

Base patch patch + compete
4.947 5.0676 5.1336
+2.4% +3.7%
compete means: a new stress run in cpu0 to compete with the writeback process.
PID USER %CPU %MEM TIME+ COMMAND P
1367 root 49.5 0.0 1:09.17 bash (writeback) 0
1737 root 49.5 2.2 0:27.46 stress (use percpu
lock) 0

around 2.4% increase in real time,including the execution of
folio_rotate_reclaimable(not executed without this patch) and lru_add_drain,but
no lock contentions.

around 1.3% additional increase in real time with lock contentions on the same
cpu.

There is another option here, which is not to move the page to the
tail of the inactive
list after end_writeback and delete the following code in
zswap_writeback_entry(),
which did not work properly. But the pages will not be released first.

/* move it to the tail of the inactive list after end_writeback */
SetPageReclaim(page);

Thanks,
Zhongkun

> Thanks,
> Nhat
>
> >
> > Thanks for your time Nhat and Andrew. Happy New Year!