Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry

From: Zhongkun He
Date: Fri Jan 05 2024 - 09:10:30 EST


>
> Hmm originally I was thinking of doing an (unconditional)
> lru_add_drain() outside of zswap_writeback_entry() - once in
> shrink_worker() and/or zswap_shrinker_scan(), before we write back any
> of the entries. Not sure if it would work/help here tho - haven't
> tested that idea yet.
>

The pages are allocated by __read_swap_cache_async() in
zswap_writeback_entry() and it must be newly allocated,
not cached in swap.
Please see the code below in zswap_writeback_entry()

page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
NO_INTERLEAVE_INDEX, &page_was_allocated);
if (!page) {
goto fail;}
/* Found an existing page, we raced with load/swapin */
if (!page_was_allocated) {
put_page(page);
ret = -EEXIST;
goto fail;
}

So when it comes to SetPageReclaim(page),
The page has just been allocated and is still in the percpu batch,
which has not been added to the LRU.

Therefore,lru_add_drain() did not work outside the
zswap_writeback_entry()

> >
> > New test:
> > This patch will add the execution of folio_rotate_reclaimable(not executed
> > without this patch) and lru_add_drain,including percpu lock competition.
> > I bind a new task to allocate memory and use the same batch lock to compete
> > with the target process, on the same CPU.
> > context:
> > 1:stress --vm 1 --vm-bytes 1g (bind to cpu0)
> > 2:stress --vm 1 --vm-bytes 5g --vm-hang 0(bind to cpu0)
> > 3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0.
> >
> > Average time of five tests
> >
> > Base patch patch + compete
> > 4.947 5.0676 5.1336
> > +2.4% +3.7%
> > compete means: a new stress run in cpu0 to compete with the writeback process.
> > PID USER %CPU %MEM TIME+ COMMAND P
> > 1367 root 49.5 0.0 1:09.17 bash (writeback) 0
> > 1737 root 49.5 2.2 0:27.46 stress (use percpu
> > lock) 0
> >
> > around 2.4% increase in real time,including the execution of
> > folio_rotate_reclaimable(not executed without this patch) and lru_add_drain,but
> > no lock contentions.
>
> Hmm looks like the regression is still there, no?

Yes, it cannot be eliminated.

>
> >
> > around 1.3% additional increase in real time with lock contentions on the same
> > cpu.
> >
> > There is another option here, which is not to move the page to the
> > tail of the inactive
> > list after end_writeback and delete the following code in
> > zswap_writeback_entry(),
> > which did not work properly. But the pages will not be released first.
> >
> > /* move it to the tail of the inactive list after end_writeback */
> > SetPageReclaim(page);
>
> Or only SetPageReclaim on pages on LRU?

No, all the pages are newly allocted and not on LRU.

This patch should add lru_add_drain() directly, remove the
if statement.
The purpose of writing back data is to release the page,
so I think it is necessary to fix it.

Thanks for your time, Nhat.