Re: Buffer and page cache

V. Ganesh (ganesh@veritas.com)
Tue, 2 Nov 1999 15:09:33 -0800 (PST)


> Ah. I suppose I was incomplete in expressing my thought, and I'll try
> again: I was thinking that in your writepage function, you could mark it
> dirty and bump the page count. Since the page's lock is held when
> writepage is called, you don't have to worry about incrementing count
> twice. Multiple writepage()s can occur simultaneously, whereas only one
> write() can occur on a file at one time. Am I making sense? In any case,
> if you haven't looked over Eric's shmfs patches (see the linux-mm
> archives), do -- he implemented dirty pages in the page cache, which will
> be worth revisting during 2.5.
>
> -ben

ok, let me try again.
page->count

free 0

in pagecache, 1
clean

write 2 <----- A

write done, 1 <----- B
still in
pagecache

write 2

write done 1

ok, so we've done two writes to this page and it's still in the pagecache.
however, the fact that it is dirty is not recorded anywhere in struct page
itself. it's implicitly stored in the state of page->buffers. now if you
have a filesystem like nfs which doesn't want to use page->buffers, then
you have no way of recording this fact.

you could, at point A, bump up refcount by one more, making it 2 after
the write is done. this will make it safe wrt shrink_mmap, but what about
future writes ? should they bump up refcount by 2 again ? there's no getting
away from the fact that dirtiness is a state and can't be represented by
a count.

what nfs does is implement its own writeback mechanism. on the first write to
a page, it is added to an internal writeback queue and refcount is bumped up
so that it is 2 after the writepage returns. this insulates the page from
shrink_mmap. future writes to the same page can <handwave> probably
find the page in the internal queue itself, so page->count is not further
incremented. </handwave>

so we can do it, but nfs and nfs-like filesystems can't use the generic_*
mechanisms. it's ok if there's just one such fs, but if we have three or
four, then it begins to make sense to shift such functionality to the generic
code. it would have little or no impact on other filesystems and would
probably clean up the whole vmscan stuff anyway.

here's a quote from fs/nfs/write.c
* FIXME: Interaction with the vmscan routines is not optimal yet.
* Either vmscan must be made nfs-savvy, or we need a different page
* reclaim concept that supports something like FS-independent
* buffer_heads with a b_ops-> field.

ganesh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/