Re: [PATCH 5/6] writeback: sync expired inodes first in backgroundwriteback

From: Andrew Morton
Date: Wed Apr 20 2011 - 19:41:39 EST

Next message: Andrew Morton: "Re: [PATCH 1/4] export kernel call get_task_comm()."
Previous message: Ben Nizette: "Re: [PATCH 1/2] gpio: add pin biasing and drive mode to gpiolib"
In reply to: Wu Fengguang: "[PATCH 5/6] writeback: sync expired inodes first in background writeback"
Next in thread: Wu Fengguang: "Re: [PATCH 5/6] writeback: sync expired inodes first in backgroundwriteback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 20 Apr 2011 16:03:41 +0800
Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:

> A background flush work may run for ever. So it's reasonable for it to
> mimic the kupdate behavior of syncing old/expired inodes first.
>
> At each queue_io() time, first try enqueuing only newly expired inodes.
> If there are zero expired inodes to work with, then relax the rule and
> enqueue all dirty inodes.
>
> This will help reduce the number of dirty pages encountered by page
> reclaim, eg. the pageout() calls. Normally older inodes contain older
> dirty pages, which are more close to the end of the LRU lists. So
> syncing older inodes first helps reducing the dirty pages reached by
> the page reclaim code.
>
> More background: as Mel put it, "it makes sense to write old pages first
> to reduce the chances page reclaim is initiating IO."
>
> Rik also presented the situation with a graph:
>
> LRU head [*] dirty page
> [ * * * * * * * * * * *]
>
> Ideally, most dirty pages should lie close to the LRU tail instead of
> LRU head. That requires the flusher thread to sync old/expired inodes
> first (as there are obvious correlations between inode age and page
> age), and to give fair opportunities to newly expired inodes rather
> than sticking with some large eldest inodes (as larger inodes have
> weaker correlations in the inode<=>page ages).
>
> This patch helps the flusher to meet both the above requirements.
>
> Side effects: it might reduce the batch size and hence reduce
> inode_wb_list_lock hold time, but in turn make the cluster-by-partition
> logic in the same function less effective on reducing disk seeks.

One of the many requirements for writeback is that if userspace is
continually dirtying pages in a particular file, that shouldn't cause
the kupdate function to concentrate on that file's newly-dirtied pages,
neglecting pages from other files which were less-recently dirtied.
(and dirty nodes, etc).

And the background writeback function and fsync() and msync() and
everything else shouldn't cause starvation of expired pages, either. I
guess you could say that the expired dirty pages become the
highest-priority writeback item.

Are you testing for this failure scenario? If so, can you briefly
describe the testing?

It would be hlpeful if you could explain how the current code
implements this requirement?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH 1/4] export kernel call get_task_comm()."
Previous message: Ben Nizette: "Re: [PATCH 1/2] gpio: add pin biasing and drive mode to gpiolib"
In reply to: Wu Fengguang: "[PATCH 5/6] writeback: sync expired inodes first in background writeback"
Next in thread: Wu Fengguang: "Re: [PATCH 5/6] writeback: sync expired inodes first in backgroundwriteback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]