Re: journal aborted, system read-only

From: Andrew Morton
Date: Tue Sep 21 2004 - 04:04:15 EST


"Stephen C. Tweedie" <sct@xxxxxxxxxx> wrote:
>
> Hi,
>
> On Sun, 2004-09-12 at 16:28, Gene Heskett wrote:
>
> > I just got up, and found advisories on every shell open that the
> > journal had encountered an error and aborted, converting my /
> > partition to read-only.
> ...
> > The kernel is 2.6.9-rc1-mm4. .config available on request.
>
> > This is precious little info to go on, but basicly I'm wondering if
> > anyone else has encountered this?
>
> Well, we really need to see _what_ error the journal had encountered to
> be able to even begin to diagnose it. But 2.6.9-rc1-mm3 and -mm4 had a
> bug in the journaling introduced by low-latency work on the checkpoint
> code; can you try -mm5 or back out
> "journal_clean_checkpoint_list-latency-fix.patch" and try again?
>

Turns out this is due to the reworked buffer/page sleep/wakeup code in
recent -mm's. If the journal timer wakes kjournald while kjournald is
waiting on a read of a journal indirect block, kjournald just plunges ahead
with a still-locked, non-uptodate buffer. Which it treats as an I/O error,
and things don't improve from there.

This should fix.

--- 25/kernel/wait.c~wait_on_bit-must-loop 2004-09-21 01:33:18.000000000 -0700
+++ 25-akpm/kernel/wait.c 2004-09-21 01:44:36.706435616 -0700
@@ -157,8 +157,9 @@ __wait_on_bit(wait_queue_head_t *wq, str
int ret = 0;

prepare_to_wait(wq, &q->wait, mode);
- if (test_bit(q->key.bit_nr, q->key.flags))
+ do {
ret = (*action)(q->key.flags);
+ } while (test_bit(q->key.bit_nr, q->key.flags) && !ret);
finish_wait(wq, &q->wait);
return ret;
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/