Re: 2.6.19 file content corruption on ext3

From: Linus Torvalds
Date: Tue Dec 19 2006 - 19:04:51 EST




On Wed, 20 Dec 2006, Peter Zijlstra wrote:

> On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote:
>
> > Well... we'd need to see (corruption && this-not-triggering) to be sure.
> >
> > Peter, have you been able to trigger the corruption?
>
> Yes; however the mail I send describing that seems to be lost in space.

Btw, can somebody actually explain the mess that is ext3 "dirtying".

Ext3 does NOT use __set_page_dirty_buffers. It does

static int ext3_journalled_set_page_dirty(struct page *page)
{
SetPageChecked(page);
return __set_page_dirty_nobuffers(page);
}

and uses that "Checked" bit as a "whole page is dirty" bit (which it tests
in "writepage()".

You realize what this all means? It means that ANYTHING that actually
clears the _real_ dirty bit won't actually be doing anything at all for
ext3, because the Checked bit will still stay set, and any IO down the
line on that page would totally ignore the dirty bits on the buffer heads
and just write out everything.

That is "The Mess(tm)".

It also basically means that anything that clears the dirty bit without
just calling "writepage()" had _better_ call "invalidatepage()" for the
whole page, because otherwise the PageChecked bit will never be cleared as
far as I can see. Happily, at least ext3 seems to _test_ for that case in
the release_page() function, so it appears that we do do this.

But this seems to just strengthen my argument: you can NEVER clean a page,
unless you (a) do IO on it immediately afterwards (writeback) or (b)
invalidate it entirely (truncate).

I'd really like to see just those two functions exist. Preferably in a
form where you can see easily that we actually follow those rules. Rather
than having a confusing set of "clear_page_dirty()" and
"test_and_clear_page_dirty()" functions that are called from random
places.

IOW, I think the "clear_page_dirty_for_io()" is fine (it's case (a))
above, and then we should probably have a "cancel_dirty_page()" function
that does all the current clear_page_dirty() but also makes sure that we
actually call the invalidate_page() function itself.

Hmm?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/