On Mon, Jan 09, 2012 at 06:10:23PM -0500, Rik van Riel wrote:
+ get_swap_cluster(entry,&offset,&end_offset);
+
+ for (; offset<= end_offset ; offset++) {
/* Ok, do the async read-ahead now */
page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
gfp_mask, vma, addr);
if (!page)
- break;
+ continue;
page_cache_release(page);
}
For a heavily fragmented swap file, this will result in more IO and
the gamble is that pages nearby are needed soon. You say your virtual
machines swapin faster and that does not surprise me. I also expect
they need the data so it's a net win.
There is an possibility that under memory pressure that swapping in
more pages will cause more memory pressure (increased swapin causing
clean page cache discards and pageout) and be an overall loss. This may
be a net loss in some cases such as where the working set size is just
over physical memory and the increased swapin causes a problem. I doubt
this case is common but it is worth bearing in mind if future bug
reports complain about increased swap activity.
- si = swap_info[swp_type(entry)];
- target = swp_offset(entry);
- base = (target>> our_page_cluster)<< our_page_cluster;
- end = base + (1<< our_page_cluster);
- if (!base) /* first page is swap header */
- base++;
+ si = swap_info[swp_type(entry)];
+ /* Round the begin down to a page_cluster boundary. */
+ offset = (offset>> page_cluster)<< page_cluster;
Minor nit but it would feel more natural to me to see
offset& ~((1<< page_cluster) - 1)
but I understand that you are reusing the existing code.
+ *begin = offset;
+ /* Round the end up, but not beyond the end of the swap device. */
+ offset = offset + (1<< page_cluster);
+ if (offset> si->max)
+ offset = si->max;
+ *end = offset;
spin_unlock(&swap_lock);
-
- /*
- * Indicate starting offset, and return number of pages to get:
- * if only 1, say 0, since there's then no readahead to be done.
- */
- *offset = ++toff;
- return nr_pages? ++nr_pages: 0;
}
This section deletes code which is nice but there is a
problem. Your changelog says that this is duplicating the effort of
read_swap_cache_async() which is true but what it does is
1. a swap cache lookup which will probably fail
2. alloc_page_vma()
3. radix_tree_preload()
4. swapcache_prepare
- calls __swap_duplicate()
- finds the hole, bails out
That's a lot of work before the hole is found. Would it be worth
doing a racy check in swapin_readahead without swap lock held before
calling read_swap_cache_async()?