Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry

From: Zhongkun He
Date: Tue Jan 02 2024 - 06:39:38 EST


On Sat, Dec 30, 2023 at 10:09 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Tue, Oct 24, 2023 at 7:27 AM Zhongkun He
> <hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
> >
>
> My apologies for the delayed response. I have a couple of questions.
>
> > The zswap_writeback_entry() will add a page to the swap cache, decompress
> > the entry data into the page, and issue a bio write to write the page back
> > to the swap device. Move the page to the tail of lru list through
> > SetPageReclaim(page) and folio_rotate_reclaimable().
> >
> > Currently, about half of the pages will fail to move to the tail of lru
>
> May I ask what's the downstream effect of this? i.e so what if it
> fails? And yes, as Andrew pointed out, it'd be nice if the patch
> changelog spells out any observable or measurable change from the
> user's POV.
>

The swap cache page used to decompress zswap_entry should be
moved to the tail of the inactive list after end_writeback, We can release
them in time.Just like the following code in zswap_writeback_entry().

/* move it to the tail of the inactive list after end_writeback */
SetPageReclaim(page);

After the writeback is over, the function of
folio_rotate_reclaimable() will fail
because the page is not in the LRU list but in some of the cpu folio batches.
Therefore we did not achieve the goal of setting SetPageReclaim(page), and
the pages could not be free in time.

> > list because there is no LRU flag in page which is not in the LRU list but
> > the cpu_fbatches. So fix it.
>
> This sentence is a bit confusing to me. Does this mean the page
> currently being processed for writeback is not in the LRU list
> (!PageLRU(page)), but IN one of the cpu folio batches? Which makes
> folio_rotate_reclaimable() fails on this page later on in the
> _swap_writepage() path? (hence the necessity of lru_add_drain()?)
>

Yes, exactly.

> Let me know if I'm misunderstanding the intention of this patch. I
> know it's a bit pedantic, but spelling things out (ideally in the
> changelog itself) will help the reviewers, as well as future
> contributors who want to study the codebase and make changes to it.
>

Sorry,my bad.

> >
> > Signed-off-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
>
> Thanks and look forward to your response,
> Nhat
>
> P/S: Have a nice holiday season and happy new year!

Here are the steps and results of the performance test:
1:zswap+ zram (simplified model with on IO)
2:disabel zswap/parameters/same_filled_pages_enabled (stress have same pages)
3:stress --vm 1 --vm-bytes 2g --vm-hang 0 (2Gi anon pages)
4: In order to quickly release zswap_entry, I used the previous
patch (I will send it again later).
https://lore.kernel.org/all/20231025095248.458789-1-hezhongkun.hzk@xxxxxxxxxxxxx/

Performance result:
reclaim 1Gi zswap_entry

time echo 1 > writeback_time_threshold
(will release the zswap_entry, not been accessed for more than 1 seconds )

Base With this patch
real 0m1.015s real 0m1.043s
user 0m0.000s user 0m0.001s
sys 0m1.013s sys 0m1.040s
So no obvious performance regression was found.

After writeback, we perform the following steps to release the memory again
echo 1g > memory.reclaim

Base:
total used recalim total used
Mem: 38Gi 2.5Gi ----> 38Gi 1.5Gi
Swap: 5.0Gi 1.0Gi ----> 5Gi 1.5Gi
used memory -1G swap +0.5g
It means that half of the pages are failed to move to the tail of lru list,
So we need to release an additional 0.5Gi anon pages to swap space.

With this patch:
total used recalim total used
Mem: 38Gi 2.6Gi ----> 38Gi 1.6Gi
Swap: 5.0Gi 1.0Gi ----> 5Gi 1Gi

used memory -1Gi, swap +0Gi
It means that we release all the pages which have been add to the tail of
lru list in zswap_writeback_entry() and folio_rotate_reclaimable().


Thanks for your time Nhat and Andrew. Happy New Year!