Re: [PATCH 2/3] NFS: lock the readdir page while it is in use

From: Trond Myklebust
Date: Wed Dec 01 2010 - 09:50:21 EST


On Tue, 2010-11-30 at 21:06 -0800, Linus Torvalds wrote:
> On Tue, Nov 30, 2010 at 8:29 PM, Trond Myklebust
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >
> > I'm not worried about other readdir calls invalidating the page. My
> > concern is rather about the VM memory reclaimers ejecting the page from
> > the page cache, and calling nfs_readdir_clear_array while we're
> > referencing the page.
>
> I think you're making a fundamental mistake here, and you're confused
> by a much deeper problem.
>
> The thing is, the ".releasepage" callback gets called _before_ the
> page actually gets removed from the page cache, and there is no
> guarantee that it will always be removed at all!
>
> In fact, anybody holding a reference to it will result in
> page_freeze_refs() not successfully clearing all the refs, and that in
> turn will abort the actual freeing of the page. So while you hold the
> page count, your page will NOT be freed. Guaranteed.
>
> But it is true that the ".releasepage()" function may be called. So if
> your NFS release callback ends up invalidating the data on that page,
> that page lock thing will make a difference, yes.
>
> But at the same time, are you sure that you are able to then handle
> the case of that page still existing in the page cache and being used
> afterwards? Looking at the code, it doesn't look that way to me.
>
> So I think you're confused, and the NFS code totally incorrectly
> thinks that ".releasepage" is something that happens at the last use
> of the page. It simply is not so. In fact, you seem to return 0, which
> I think means "failure to release", so the VM will just mark it busy
> again afterwards.
>
> Now, I think you do have a few options:
>
> - keep the current model. BUT! In the page cache release function
> (nfs_readdir_clear_array), make sure that you also clear the
> up-to-date bit, so that the page gets read back in since it no longer
> contains any valid information. And return success for the
> "releasepage" operatioin.

This is the option I'm attempting for now. I'm quite fine with the page
only being readable while under the page lock since this is readdir, and
so we don't have to worry about mmap() and its ilk.

My latest incarnation of the patches has added in all the PagePrivate()
foo to make try_to_release_page() work, and also adds in an
invalidatepage() address space operation (I rediscovered that
truncate_inode_pages() will otherwise call block_invalidatepage, and
subsequently Oops).

I'm still in the process of testing, but the latest patchset appears to
be doing all the right things for now. I'll post an update later today
so that Nick and others can test too.

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/