Re: Buffer and page cache

Stephen C. Tweedie (sct@redhat.com)
Tue, 2 Nov 1999 16:32:47 +0000 (GMT)


Hi,

On Tue, 02 Nov 1999 08:15:36 -0700, braam@cs.cmu.edu said:

> I'd like these pages to age a little before handing them over to the
> "inode disk", because the "write_one_page" function called by
> generic_file_write would incur significant latency if the inode disk is
> "real", ie. not simulated in the same system.

The write-page method is only required to queue the data for writing to
the media. It is not required to complete the physical IO, so the
filesystem can use any mechanism it likes to keep those pages queued for
eventual physical IO (just as 2.3 uses the buffer lists to queue that
data for eventual writeback via bdflush).

> So we have a page cache for the inodes in the file system where the
> pages become dirty - but no buffers are attached. It reminds of a
> shared mapping, but there is no vma for the pages.

Fine.

> What appears to be needed is the following - probably it's mostly
> lacking in my understanding, but I'd appreciate to be advised how to
> attack the following points:

> - a bit to keep shrink_mmap away from the page.

Yes, bumping the page count is the perfect way to do this.

> - a bit for a struct page that indicates the page needs to be written.
> From block_write_full_page one could think that the PageUptoDate bit is
> maybe the one to use. But does that really describe that this page is
> "dirty" - as it is done for buffers.

PageUpToDate can't be used: it is needed to flag whether the contents of
the page are valid for a read. A written page must always be uptodate:
!uptodate implies that we have created the page but are still reading it
in from disk (or that the readin failed for some reason).

> - some indication of aging: we would like a pgflush daemon to walk the
> dirty pages of the file system and write them back _after_ a little
> while

The fs should be able to manage that on its own. If you queue all of
the pages which have been sent to the writepage() method, then you can
flush to the physical disk whenever you want. A trivial bdflush
lookalike in the fs itself can deal with that.

You might well want a filesystem-private pointer in the page struct off
which to hook any fs-specific data (such as your dirty page linked list
pointers and the dirty flag). You will also need a way for the VM to
exert memory pressure on those pages if it needs to reclaim memory.

These are both things which ext3 will want anyway, so we should make
sure that any infrastructure that gets put in place for this gets
reviewed by all the different fs groups first.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/