Re: [RFC] [PATCH] Dirty pages in the page cache

Eric W. Biederman (ebiederm+eric@npwt.net)
12 Jan 1998 21:49:38 -0600


>>>>> "ST" == Stephen C Tweedie <sct@dcs.ed.ac.uk> writes:

ST> Hi,

>> The basic idea of use is a filesystem will set an extra dirty on a
>> page, when a write occurs to it. And if the dirty bit is set it is
>> garanteed that before the page is removed from memory `writepage' will
>> be called. For filesystems that need more precise tracking they
>> should be able to do that on their own.

ST> One big question with this: _should_ this sort of thing be done directly
ST> in the page cache? Most filesystems will require extra information in
ST> order to keep their current write semantics. Ext2fs will require
ST> information about which blocks within the buffer are dirty, especially
ST> for short files on 1k block filesystems. NFS will require credential
ST> information.

I don't know if we need to do this for all filesystems. Network
filesystems seem to be a good canidates. We could include an extra
void *pointer in struct page cache that could be used for tracking
extra information (or contain it directly). Though we could as easily
reuse the struct buffer_head as a generic data pointer, for a page
cache page. (But usually struct buffer heads?)

My primary target (besides my own fs) when I designed it was
filesystems with compressed files. That needs seperate caching of
the file data, and what is written to the disk. Something the page
cache is good at.

Also one tricky element is reducing cache size when memory is low.
If shrink_mmap doesn't shrink the page cache all kinds of interesting
things can occur. System slow downs, getfreepage lockups etc. This
is an important justification for doing it in the page cache directly
so shrink_mmap can free the pages, or at least start freeing them.

My thought (assuming we are using generic_file_write), is that
update_page can set do the filesystem specific setup of just how it is
to be done and set the dirty_bit, and then write_page will be called
when the dirty bit is set. So it is still optional, and the specifics
are controlled by the individual filesystems.

ST> An alternative way which would certainly make it simpler to maintain
ST> ext2fs semantics with minimal effort would be to make the ext2 write
ST> code just a little bit smarter than it is: instead of copying user data
ST> into buffers and doing a separate vm update, it could overlay the
ST> required buffers on top of the page cache, sharing the physical page,
ST> and let the existing bdflush write out the pages eventually.

ST> The struct buffer_head cache is a fairly natural place to keep
ST> information about physically dirty block device blocks, but the data
ST> itself could just as easily stay in the page cache. It would be a
ST> fairly trivial extension to the buffer.c async logic to allow
ST> asynchronous writeback buffers to be marked as free-after-IO (currently
ST> we only do free-after-IO on anonymous buffer_heads). We already have
ST> the necessary timing logic to do 30-second writeback of buffer_heads,
ST> too.

ST> There's a big reason to keep the page cache dirty, though, and that is
ST> for speed of fsync(). Fortunately, the page->buffers pointer would give
ST> us an easy way of finding (and syncing) all dirty buffers associated
ST> with each page on an inode's page-cache ring.

A definite advantage, I handn't realized. Still you can use the page
cache dirty bit as a signal that (at least one buffer), is dirty. The
replication wouldn't hurt.

Still this leaves open the question of what to do about readpage &
writepage, and the fact NFS needs dentries for them.

Eric