Re: [patch 3/3][rfc] vmscan: batched swap slot allocation

From: KAMEZAWA Hiroyuki
Date: Mon Apr 20 2009 - 21:00:43 EST


On Mon, 20 Apr 2009 22:24:45 +0200
Johannes Weiner <hannes@xxxxxxxxxxx> wrote:

> Every swap slot allocation tries to be subsequent to the previous one
> to help keeping the LRU order of anon pages intact when they are
> swapped out.
>
> With an increasing number of concurrent reclaimers, the average
> distance between two subsequent slot allocations of one reclaimer
> increases as well. The contiguous LRU list chunks each reclaimer
> swaps out get 'multiplexed' on the swap space as they allocate the
> slots concurrently.
>
> 2 processes isolating 15 pages each and allocating swap slots
> concurrently:
>
> #0 #1
>
> page 0 slot 0 page 15 slot 1
> page 1 slot 2 page 16 slot 3
> page 2 slot 4 page 17 slot 5
> ...
>
> -> average slot distance of 2
>
> All reclaimers being equally fast, this becomes a problem when the
> total number of concurrent reclaimers gets so high that even equal
> distribution makes the average distance between the slots of one
> reclaimer too wide for optimistic swap-in to compensate.
>
> But right now, one reclaimer can take much longer than another one
> because its pages are mapped into more page tables and it has thus
> more work to do and the faster reclaimer will allocate multiple swap
> slots between two slot allocations of the slower one.
>
> This patch makes shrink_page_list() allocate swap slots in batches,
> collecting all the anonymous memory pages in a list without
> rescheduling and actual reclaim in between. And only after all anon
> pages are swap cached, unmap and write-out starts for them.
>
> While this does not fix the fundamental issue of slot distribution
> increasing with reclaimers, it mitigates the problem by balancing the
> resulting fragmentation equally between the allocators.
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Hugh Dickins <hugh@xxxxxxxxxxx>
> ---
> mm/vmscan.c | 49 +++++++++++++++++++++++++++++++++++++++++--------
> 1 files changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 70092fa..b3823fe 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -592,24 +592,42 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> enum pageout_io sync_writeback)
> {
> LIST_HEAD(ret_pages);
> + LIST_HEAD(swap_pages);
> struct pagevec freed_pvec;
> - int pgactivate = 0;
> + int pgactivate = 0, restart = 0;
> unsigned long nr_reclaimed = 0;
>
> cond_resched();
>
> pagevec_init(&freed_pvec, 1);
> +restart:
> while (!list_empty(page_list)) {
> struct address_space *mapping;
> struct page *page;
> int may_enter_fs;
> int referenced;
>
> - cond_resched();
> + if (list_empty(&swap_pages))
> + cond_resched();
>
Why this ?

> page = lru_to_page(page_list);
> list_del(&page->lru);
>
> + if (restart) {
> + /*
> + * We are allowed to do IO when we restart for
> + * swap pages.
> + */
> + may_enter_fs = 1;
> + /*
> + * Referenced pages will be sorted out by
> + * try_to_unmap() and unmapped (anon!) pages
> + * are not to be referenced anymore.
> + */
> + referenced = 0;
> + goto reclaim;
> + }
> +
> if (!trylock_page(page))
> goto keep;
>
Keeping multiple pages locked while they stay on private list ?

BTW, isn't it better to add "allocate multiple swap space at once" function
like
- void get_swap_pages(nr, swp_entry_array[])
? "nr" will not be bigger than SWAP_CLUSTER_MAX.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/