Some thoughts on write back page cache cacheing

Eric W. Biederman (ebiederm+eric@npwt.net)
01 Nov 1997 16:45:52 -0600


I have been pondering the issues of caching in compressed filesystems,
and how to get Posix.4 shared memory areas into the linux kernel. It
occured to me that a decent way to work on all of these goals would be
to write a filesystem that resides in swap, and is cached only in the
page cache.

I have that filesystem working to proof of concept stage now, though
it needs some more work to make it practical. The one truly
innovative feature of this filesystem is that it uses a single btree
to hold all of the directories for the whole filesystem. And this
btree is working read/write with performance on large file names (that
don't get in the dcache) better than the ext2 fs. It's reads/writes
are about on par with ext2 as ext2 is quite efficient.

My most recent alpha version of this filesystem (against linux-2.0.29)
is located at http://www.nwpt.net/~ebiederm/files/shmfs-0.0.017.tar.gz

In writing this file system I have found a few possibly worth kernel
modifications.
1) Several more page cache functions need to be exported.
2) Implementation of the PG_dirty bit. To track dirty pages.
(I have a preliminary patch below).
3) The kernel handling of mmaped files nees to be reviewed.

- I believe that invalidate_inode_pages when pages are mmaped.
As it does not unmap those pages, yet removes them from the page
cache.

- Truncate inode pages needs a second look as well. What are the
correct semantics when the following happens.;
page_size = getpagesize
ftruncate(fd, 4*page_size);
maping = mmap(NULL, 4*page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
ftruncate(fd, 1*page_size);
write(fd, buf, 3*page_size); /* to the end of the file */
Shold the last 3 pages of the mapping be equal or not (assume
different data was written then was there in the first place).

4) How should we discover (assume a machine that is _not_ swapping)
mmaped pages have been written to.

5) How do we garantee that all changes get written periodically. We
handle this for the buffer cache but there is not support _yet_ for
the page cache.

6) How should we handle swapping through the page cache? The thought
has been proposed by the nfs code, to use writepage() to handle
swapping through the the page cache. I use it in a slightly
different way to be notified of changes (but unfortunantely only
when we are trying to free memory).

Patch for simple implementation of inode dirty pages:
Policy is left to the filesystem:

diff -uNr linux-2.1.61/include/linux/mm.h linux-2.1.61.eric/include/linux/mm.h
--- linux-2.1.61/include/linux/mm.h Fri Oct 31 23:05:37 1997
+++ linux-2.1.61.eric/include/linux/mm.h Sat Nov 1 15:54:26 1997
@@ -138,6 +138,7 @@
#define PG_DMA 7
#define PG_Slab 8
#define PG_swap_cache 9
+#define PG_dirty 10
#define PG_reserved 31

/* Make it prettier to test the above... */
diff -uNr linux-2.1.61/mm/filemap.c linux-2.1.61.eric/mm/filemap.c
--- linux-2.1.61/mm/filemap.c Fri Oct 31 23:06:34 1997
+++ linux-2.1.61.eric/mm/filemap.c Sat Nov 1 15:57:48 1997
@@ -149,6 +149,22 @@
} while (tmp != bh);
}

+ /* If a page is dirty clean it */
+ if (PageDirty(page) && page->inode &&
+ page->inode->i_op && page->inode->i_op->writepage) {
+ int error;
+ error = inode->i_op->writepage(page->inode, page);
+ if (!error) {
+ clear_bit(PG_dirty, &(page)->flags);
+ }
+ return !error;
+ }
+ else {
+ /* If a writepage isn't supported PG_dirty is useless.
+ */
+ clear_bit(PG_dirty, &(page)->flags);
+ }
+
/* We can't throw away shared pages, but we do mark
them as referenced. This relies on the fact that
no page is currently in both the page cache and the