Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file contentcorruption on ext3)

From: Linus Torvalds
Date: Wed Dec 20 2006 - 18:56:38 EST




On Wed, 20 Dec 2006, Andrew Morton wrote:
>
> > +void cancel_dirty_page(struct page *page, unsigned int account_size)
> > +{
> > + /* If we're cancelling the page, it had better not be mapped any more */
> > + if (page_mapped(page)) {
> > + static unsigned int warncount;
> > +
> > + WARN_ON(++warncount < 5);
> > + }
> > +
> > + if (TestClearPageDirty(page) && account_size)
> > + task_io_account_cancelled_write(account_size);
> > +}
>
> This doesn't clear the radix-tree dirty tags. I'm not sure what effect
> that would have on a truncated mapping. Perhaps just a bit of extra work
> in radix-tree lookup during writeback.

This should _only_ be a valid thing to do when we're removing the page
from a mapping anyway, so I'd most definitely hope that the code
immediately after (or before) will have done a "remove_from_page_cache()"

In which case the tags should not matter.

There is _no_ excuse for cancelling a page and leaving it in the page
cache that I can see. Because your page contents will be _indeterminate_.

> > @@ -386,12 +399,8 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
>
> invalidate_complete_page2() is pretty gruesome. We're handling the case
> where someone went and redirtied the page (and hence its buffers) after the
> invalidate_inode_pages2() caller (generic_file_direct_IO) synced it to
> disk.
>
> I'd prefer to just fail the direct-io if someone did that, but then
> people's tests fail and they whine.

So with my change, afaik, we will just return EIO to the invalidate, and
do the write. Which should be ok. In fact, it appears to be the only
possibly valid thing to do.

It really boils down to that same thing: if you remove the dirty bit,
there is NO CONCEIVABLE GOOD THING YOU CAN DO EXCEPT FOR:
- do the damn IO already ("clear_page_dirty_for_io()")
- truncate the page (unmap and destroy it both from page cache AND from
any user-visible filesystem cases)

Anything else is simpyl a bug. Always has been. My patch just makes that
very clear.

> With your change I think what'll happen is that we'll correctly handle the
> case where the page and its buffers are dirty (it gets left in place), but
> we'll needlessy fail in the case where the page is dirty but the buffers
> are clean. How important that will be in practice I do not know. People
> will get -EIOs where they used not to.

People will now get -EIO where they used to get an inconsistent system
image. I really think it sounds like an improvement.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/