Re: [PATCH 3/3][RFC] swsusp: shrink file cache first

From: Johannes Weiner
Date: Fri Feb 06 2009 - 07:25:06 EST


On Fri, Feb 06, 2009 at 02:59:35PM +0900, KOSAKI Motohiro wrote:
> Hi
>
> > > if we think suspend performance, we should consider swap device and file-backed device
> > > are different block device.
> > > the interleave of file-backed page out and swap out can improve total write out performce.
> >
> > Hm, good point. We could probably improve that but I don't think it's
> > too pressing because at least on my test boxen, actual shrinking time
> > is really short compared to the total of suspending to disk.
>
> ok.
> only remain problem is mesurement result posting :)
>
>
> > > if we think resume performance, we shold how think the on-disk contenious of the swap consist
> > > process's virtual address contenious.
> > > it cause to reduce unnecessary seek.
> > > but your patch doesn't this.
> > >
> > > Could you explain this patch benefit?
> >
> > The patch tries to shrink those pages first that are most unlikely to
> > be needed again after resume. It assumes that active anon pages are
> > immediately needed after resume while inactive file pages are not. So
> > it defers shrinking anon pages after file cache.
>
> hmm, I'm confusing.
> I agree active anon is important than inactive file.
> but I don't understand why scanning order at suspend change resume order.

This is the problem: on suspend, we can only save about 50% of memory
through the suspend image because of the snapshotting. So we have to
shrink memory before suspend. Since you probably always have more RAM
used than 50%, you always have to shrink. And the image is always the
same size.

After restoring the image, resuming processes want to continue their
work immediately and the user wants to use the applications again as
soon as possible.

Everything that is saved in the suspend image is restored and back in
memory when the processes resume their work.

Everything that is NOT saved in the suspend image is still on swap or
not yet in the page page when the processes resume their work.

So if we shrink the memory in the wrong order, after restoring the
image we have page cache in memory that is not needed and those anon
pages that are needed are swapped out.

And the goal is that after restoring the image we have as much of the
working set back in memory and those pages in swap and on disk-only
that are unlikely to be used immediately by the resumed processes, so
they can continue their work without much disk io.

> > But I just noticed that the old behaviour defers it as well, because
> > even if it does scan anon pages from the beginning, it allows writing
> > only starting from pass 3.
>
> Ah, I see.
> it's obiously wrong.
>
> > I couldn't quite understand what you wrote about on-disk
> > contiguousness, but that claim still stands: faulting in contiguous
> > pages from swap can be much slower than faulting file pages. And my
> > patch prefers mapped file pages over anon pages. This is probably
> > where I have seen the improvements after resume in my tests.
>
> sorry, I don't understand yet.
> Why "prefers mapped file pages over anon pages" makes large improvement?

Because contigously mapped file pages are faster to read in than a
group of anon pages. Or at least that is my claim.

And if we have to evict some of the working set just because the
working set is bigger than 50% of memory, then it's better to evict
those pages that are cheaper to refault.

Does that make sense?

> > Yes, I'm still thinking about ideas how to quantify it properly. I
> > have not yet found a reliable way to check for whether the working set
> > is intact besides seeing whether the resumed applications are
> > responsive right away or if they first have to swap in their pages
> > again.
>
> thanks.
> I'm looking for this :)

Thanks to YOU, also for for reviewing!

> > > > @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
> > > >
> > > > /*
> > > > * We try to shrink LRUs in 5 passes:
> > > > - * 0 = Reclaim from inactive_list only
> > > > - * 1 = Reclaim from active list but don't reclaim mapped
> > > > - * 2 = 2nd pass of type 1
> > > > - * 3 = Reclaim mapped (normal reclaim)
> > > > - * 4 = 2nd pass of type 3
> > > > + * 0 = Reclaim unmapped inactive file pages
> > > > + * 1 = Reclaim unmapped file pages
> > >
> > > I think your patch reclaim mapped file at priority 0 and 1 too.
> >
> > Doesn't the following check in shrink_page_list prevent this:
> >
> > if (!sc->may_swap && page_mapped(page))
> > goto keep_locked;
> >
> > ?
>
> Grr, you are right.
> I agree, currently may_swap doesn't control swap out or not.
> so I think we should change it correct name ;)

Agreed. What do you think about the following patch?

---
Subject: vmscan: rename may_swap scan control knob

may_swap applies not only to anon pages but to mapped file pages as
well. Rename it to may_unmap which is the actual meaning.

Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9a27c44..2523600 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -60,8 +60,8 @@ struct scan_control {

int may_writepage;

- /* Can pages be swapped as part of reclaim? */
- int may_swap;
+ /* Reclaim mapped pages */
+ int may_unmap;

/* This context's SWAP_CLUSTER_MAX. If freeing memory for
* suspend, we effectively ignore SWAP_CLUSTER_MAX.
@@ -606,7 +606,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
if (unlikely(!page_evictable(page, NULL)))
goto cull_mlocked;

- if (!sc->may_swap && page_mapped(page))
+ if (!sc->may_unmap && page_mapped(page))
goto keep_locked;

/* Double the slab pressure for mapped and swapcache pages */
@@ -1694,7 +1694,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
.gfp_mask = gfp_mask,
.may_writepage = !laptop_mode,
.swap_cluster_max = SWAP_CLUSTER_MAX,
- .may_swap = 1,
+ .may_unmap = 1,
.swappiness = vm_swappiness,
.order = order,
.mem_cgroup = NULL,
@@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
{
struct scan_control sc = {
.may_writepage = !laptop_mode,
- .may_swap = 1,
+ .may_unmap = 1,
.swap_cluster_max = SWAP_CLUSTER_MAX,
.swappiness = swappiness,
.order = 0,
@@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
struct zonelist *zonelist;

if (noswap)
- sc.may_swap = 0;
+ sc.may_unmap = 0;

sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
@@ -1762,7 +1762,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .may_swap = 1,
+ .may_unmap = 1,
.swap_cluster_max = SWAP_CLUSTER_MAX,
.swappiness = vm_swappiness,
.order = order,
@@ -2109,7 +2109,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
struct reclaim_state reclaim_state;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .may_swap = 0,
+ .may_unmap = 0,
.swap_cluster_max = nr_pages,
.may_writepage = 1,
.swappiness = vm_swappiness,
@@ -2147,7 +2147,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)

/* Force reclaiming mapped pages in the passes #3 and #4 */
if (pass > 2) {
- sc.may_swap = 1;
+ sc.may_unmap = 1;
sc.swappiness = 100;
}

@@ -2292,7 +2292,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
int priority;
struct scan_control sc = {
.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
- .may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
+ .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
.swap_cluster_max = max_t(unsigned long, nr_pages,
SWAP_CLUSTER_MAX),
.gfp_mask = gfp_mask,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/