Re: 2.6.6-rc{1,2} bad VM/NFS interaction in case of dirty pagewriteback

From: Trond Myklebust
Date: Tue Apr 27 2004 - 10:41:07 EST


On Tue, 2004-04-27 at 01:58, Andrew Morton wrote:

> > Currently, we use this on virtually anything that trigggers writepage()
> > with the wbc->for_reclaim flag set.
>
> Why? Sorry, I still do not understand the effect which you're trying to
> achieve? I thought it was an efficiency thing: send more traffic via
> writepages(). But your other comments seem to indicate that this is not
> the primary reason.

That is the primary reason. Under NFS, writing is a 2 stage process:

- Stage 1: "schedule" the page for writing. Basically this entails
creating an nfs_page struct for each dirty page and putting this on the
NFS private list of dirty pages.
- Stage 2: "flush" the NFS private list of dirty pages. This involves
coalescing contiguous nfs_page structs into chunks of size "wsize", and
actually putting them on the wire in the form of RPC requests.

writepage() only deals with one page at a time, so it will work fine for
doing stage 1.
If you also try force it to do stage 2, then you will end up with chunks
of size <= PAGE_CACHE_SIZE because there will only be 1 page on the NFS
private list of dirty pages. In practice this again means that you
usually have to send 8 times as many RPC requests to the server in order
to process the same amount of data.

This is why I'd like to try to leave stage 2) to writepages().

Now normally, that is not a problem, since the actual calls to
writepage() are in fact made by means of a call to generic_writepages()
from within nfs_writepages(), so I can do all the correct things once
I'm done with generic_writepages().

However shrink_cache() (where the MM calls writepage() directly) needs
to be treated specially, since that is not wrapped by a writepages()
call. In that case, we return WRITEPAGE_ACTIVATE and wait for pdflush to
call writepages().

Cheers,
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/