Re: [PATCH]: 4/4 cluster page-out in VM scanner

From: Hans Reiser
Date: Sun Nov 21 2004 - 12:14:06 EST


How well does this integrate with reiser4.;-)

Hans

Nikita Danilov wrote:

Implement pageout clustering at the VM level.

With this patch VM scanner calls pageout_cluster() instead of
->writepage(). pageout_cluster() tries to find a group of dirty pages around
target page, called "pivot" page of the cluster. If group of suitable size is
found, ->writepages() is called for it, otherwise, page_cluster() falls back
to ->writepage().

This is supposed to help in work-loads with significant page-out of
file-system pages from tail of the inactive list (for example, heavy dirtying
through mmap), because file system usually writes multiple pages more
efficiently. Should also be advantageous for file-systems doing delayed
allocation, as in this case they will allocate whole extents at once.

Few points:

- swap-cache pages are not clustered (although they can be, but by
page->private rather than page->index)

- currently, kswapd clusters all the time, and direct reclaim only when
device queue is not congested. Probably direct reclaim shouldn't cluster at
all.

- this patch adds new fields to struct writeback_control and expects
->writepages() to interpret them. This is needed, because pageout_cluster()
calls ->writepages() with pivot page already locked, so that ->writepages()
is allowed to only trylock other pages in the cluster.

Besides, rather rough plumbing (wbc->pivot_ret field) is added to check
whether ->writepages() failed to write pivot page for any reason (in latter
case page_cluster() falls back to ->writepage()).

Only mpage_writepages() was updated to honor these new fields, but
all in-tree ->writepages() implementations seem to call
mpage_writepages(). (Except reiser4, of course, for which I'll send a
(trivial) patch, if necessary).

Numbers that talk:

Averaged number of microseconds it takes to dirty 1GB of
16-times-larger-than-RAM ext3 file mmaped in 1GB chunks:

without-patch: average: 74188417.156250
deviation: 10538258.613280

with-patch: average: 69449001.583333
deviation: 12621756.615280

(Patch is for 2.6.10-rc2)




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/