RE: Buffer and page cache

braam (braam@carissimi.coda.cs.cmu.edu)
Sun, 7 Nov 1999 22:02:02 -0700


Linus,

Sorry for the delayed reply.

-> >
-> >I'm working on a file system which talks to an "inode disk", the storage
-> >industry calls these object based disks. A simulated object based disk
-> >can be constructed from the lower half of ext2 (or any other file system
-> >for that matter).
-> >
-> >The file system has no knowledge of disk blocks, and solely uses the
-> >page cache.
->
-> Look at NFS. It sounds like your filesystem should use the page cache
-> exactly the way NFS uses it.

I think I understand what you have written.

The NFS protocol doesn't really allow for huge caches since things have to
get back to the server quickly - a file system with locks/callbacks can
cache things for ever.

So I want bigger caches - as big as those for ext2 can be - and I suspect
that:

- A - The VM must call a method in my file system to tell me to flush
pages out
- B - The page structure must have state (PageDirty bit?) to let the VM
select a page for
such flushing.

If I understand things right, the VM never reclaims pages from NFS, because
the count is upped and the page is skipped in every scan.

Managing the I/O requests in the FS is fine. A private pointer in the page
structure would be helpful - it would also help NFS to find the status of
that page back more easily. When the page is being reclaimed by the VM the
FS (or other stuff using the pointer) a method should be told to clean up
the pointer.

Am I being naive here?

- Peter -

->
-> >I'd like these pages to age a little before handing them over to the
-> >"inode disk", because the "write_one_page" function called by
-> >generic_file_write would incur significant latency if the inode disk is
-> >"real", ie. not simulated in the same system.
->
-> No no no. That's NOT how you should see write_one_page() at all.
->
-> write_one_page() really only tells you that the user wrote to the page.
-> It's up to you to write it back some time later, using whatever
-> mechanism you want (and no, that mechanism does NOT have to be a buffer
-> head, in fact the whole setup was explictly designed to work with
-> arbitrary filesystems, with NFS being the "example" fs that was used to
-> validate the earliest versions).
->
-> If you want to delay the write, that's fine, and you should do so: you
-> should just set up your own data-structures to remember which parts of
-> the page have been written, and do the right coalescing etc, and then
-> have some timeout mechanism to write them _eventually_. You'd also
-> couple it with some way to just force out the write when there are lots
-> of other writes coming in - you don't want to end up with one large
-> burst.
->
-> And, surprise, surprise, you can just steal the code in NFS to do
-> exactly this. It's all in
->
-> linux/fs/nfs/write.c
->
-> and it's actually fairly simple. Each write request makes sure the page
-> stays in memory by incrementing the page count while a delayed write
-> exists.
->
-> It gets more complicated if you want to coalesce writes over multiple
-> pages etc, but even that is by no means rocket science. It's just a lot
-> more details to keep track about.
->
-> Linus
->
-> -
-> To unsubscribe from this list: send the line "unsubscribe
-> linux-kernel" in
-> the body of a message to majordomo@vger.rutgers.edu
-> Please read the FAQ at http://www.tux.org/lkml/
->
-> [prev in list] [next in list] [prev in thread] [next in thread]
->
->
->
-> Log in / Log out
-> About MARC
-> Want to add a list? Tell us about it.
-> Progressive Computer Concepts, Inc
->

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/