Re: ReiserFS / 2.4.6 / Data Corruption

From: Andrew Morton (
Date: Mon Jul 30 2001 - 10:47:25 EST

Chris Mason wrote:
> On Saturday, July 28, 2001 03:28:05 AM +1000 Andrew Morton
> <> wrote:
> [ patch to trigger data writes before commit in reiserfs ]
> >
> > There's no disruption to disk format - it just simulates
> > the user typing `sync' at the right time. I think the
> > concept is sound, and I'm sure Chris can find a more efficient
> > way...
> Well, its gets points for simplicity ;-)

Well, I tried system("/bin/sync"); but that didn't link.

> What I think we need is for commit_write to put new buffers a per super
> list of new buffers, and then the journal code can flush that list on
> commit.

whee. Now there's an idea - If the fs keeps track of all its inodes
then you can traverse those and flush out the i_dirty_buffers ring
on each one.

writepage() output is a problem, but that never sits well with
journalling. I guess one could do fdatasync/fdatawait against
the same list of inodes.

> Since all the filesystems already mark things BH_New, it seems a good
> choice to let commit_write look for BH_New buffers and put them on this new
> list. But, the only place BH_New seems to get cleared right now is
> unmap_buffer, which only gets called from block_flushpage.
> Is there any reason we can't just clear BH_New before writing the buffer
> out? It looks like a bug to leave it set the way we do now.

I think it can be cleared as soon as the get_block() caller has looked at
it, actually. test_and_clear_bit. The lifecycle of the various buffer_head
fields is exhasperatingly fluffy.

I'd be reluctant to add another eight bytes to buffer_head though.
It's 96 now, which is a nice number. b_inode can go - it's just
a boolean. b_size and b_list can be crunched into a single byte..

How about just doing it via the inodes?

