Re: Problem in log_do_checkpoint()?

From: Jan Kara
Date: Mon Apr 11 2005 - 06:38:15 EST


Hello,

> On Mon, 2005-04-04 at 10:04, Jan Kara wrote:
>
> > In log_do_checkpoint() we go through the t_checkpoint_list of a
> > transaction and call __flush_buffer() on each buffer. Suppose there is
> > just one buffer on the list and it is dirty. __flush_buffer() sees it and
> > puts it to an array of buffers for flushing. Then the loop finishes,
> > retry=0, drop_count=0, batch_count=1. So __flush_batch() is called - we
> > drop all locks and sleep. While we are sleeping somebody else comes and
> > makes the buffer dirty again (OK, that is not probable, but I think it
> > could be possible).
>
> Yes, there's no reason why not at that point.
>
> > Now we wake up and call __cleanup_transaction().
> > It's not able to do anything and returns 0.
>
> I think the _right_ answer here is to have two separate checkpoint
> lists: the current one, plus one for which the checkpoint write has
> already been submitted. That way, we can wait for IO completion on
> submitted writes without (a) getting conned into doing multiple rewrites
> if there's somebody else dirtying the buffer; or (b) getting confused
> about how much progress we're making. Buffers on the pre-write list get
> written; buffers on the post-write list get waited for; and both count
> as progress (eliminating the false assert-failure when we failed to
> detect progress).
Yes, this seems to be a better long-term solution. A hotfix (retrying
after __flush_batch()) is attached if somebody is interested - it should
be safe and is lightly tested.

> The prevention of multiple writes in this case should also improve
> performance a little.
>
> That ought to be pretty straightforward, I think. The existing cases
> where we remove buffers from a checkpoint shouldn't have to care about
> which list_head we're removing from; those cases already handle buffers
> in both states. It's only when doing the flush/wait that we have to
> distinguish the two.
Yes, AFAICS the changes should remain local to the checkpointing code
(plus __unlink_buffer()). Should I write the patch or will you?

Honza
--
Jan Kara <jack@xxxxxxx>
SuSE CR Labs
Fix possible false assertion failure in JBD checkpointing code.

Signed-off-by: Jan Kara <jack@xxxxxxx>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.12-rc2/fs/jbd/checkpoint.c linux-2.6.12-rc2-checkpoint/fs/jbd/checkpoint.c
--- linux-2.6.12-rc2/fs/jbd/checkpoint.c 2005-03-03 18:58:29.000000000 +0100
+++ linux-2.6.12-rc2-checkpoint/fs/jbd/checkpoint.c 2005-04-05 13:26:42.000000000 +0200
@@ -339,8 +339,10 @@ int log_do_checkpoint(journal_t *journal
}
} while (jh != last_jh && !retry);

- if (batch_count)
+ if (batch_count) {
__flush_batch(journal, bhs, &batch_count);
+ retry = 1;
+ }

/*
* If someone cleaned up this transaction while we slept, we're