[patch 2.4] Fix ext3 scheduling storm and lockup

From: Andrew Morton (akpm@digeo.com)
Date: Fri Jan 17 2003 - 20:05:16 EST


This patch fixes an inefficiency and potential system lockup in the 2.4
kernel's ext3 filesystem. The problem has been present since 2.4.20-pre5.
This patch is applicable to 2.4.20. A copy is at

http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.20/ext3-scheduling-storm.patch

Anyone who is using tasks which have realtime scheduling policy on ext3
systems should apply this change.

Details:

At the start of do_get_write_access() we have this logic:

        repeat:
                lock_buffer(jh->bh);
                ...
                unlock_buffer(jh->bh);
                ...
                if (jh->j_list == BJ_Shadow) {
                        sleep_on_buffer(jh->bh);
                        goto repeat;
                }

The problem is that the unlock_buffer() will wake up anyone who is sleeping
in the sleep_on_buffer().

So if task A is asleep in sleep_on_buffer() and task B now runs
do_get_write_access(), task B will wake task A by accident. Task B will then
sleep on the buffer and task A will loop, will run unlock_buffer() and then
wake task B.

Net effect: the system does 100,000 context switches/sec until I/O completes
against the buffer and kjournald changes the value of jh->j_list.

Unless task A and task B happen to both have realtime scheduling policy - if
they do then kjournald will never run. The state is never cleared and your
box locks up.

The fix is to not do the `goto repeat;' until the buffer has been taken off
the shadow list. So we don't go and wake up the other waiter(s) until they
can actually proceed to use the buffer.

diff -puN fs/jbd/transaction.c~ext3-scheduling-storm fs/jbd/transaction.c
--- 24/fs/jbd/transaction.c~ext3-scheduling-storm 2003-01-16 02:45:19.000000000 -0800
+++ 24-akpm/fs/jbd/transaction.c 2003-01-16 02:45:19.000000000 -0800
@@ -669,7 +669,8 @@ repeat:
                         spin_unlock(&journal_datalist_lock);
                         unlock_journal(journal);
                         /* commit wakes up all shadow buffers after IO */
- sleep_on(&jh2bh(jh)->b_wait);
+ wait_event(jh2bh(jh)->b_wait,
+ jh->b_jlist != BJ_Shadow);
                         lock_journal(journal);
                         goto repeat;
                 }

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jan 23 2003 - 22:00:17 EST