Re: linux 2.2.18-pre17: "Kernel panic: LRU list corrupted"

From: Andrea Arcangeli (andrea@suse.de)
Date: Tue Oct 24 2000 - 22:06:31 EST


On Mon, Oct 23, 2000 at 02:20:17PM -0700, H. Peter Anvin wrote:
> Hi there,
>
> I wanted to let you know that I was trying 2.2.18-pre17 on
> hera.kernel.org, a uniprocessor with an SMP motherboard. After about six
> hours, it went catatonic, responding to pings and TCP SYNs but not doing
> anything that required user space.
>
> On the console, it had multiple copies of the message:
>
> "Kernel panic: LRU list corrupted" [fs/buffer.c:438]
>
> ... but no register dump.
>
> I have fallen back to 2.2.17 and it has run stably for a few days now.

I found one bug that can generate that kind of corruption and lockups and it's
in 2.2.17 too (and it was in the 2.2.18pre*aa kernels too even if for some
VM change I did it was extremely hard to reproduce there)

I fixed it in 2.2.18pre17aa1 (I suggest to give a try to 2.2.18pre17aa1 btw).

I also included the fix in a new VM-global patch against vanilla 2.2.18pre17
(the VM-global patch is available as a single patch inside 2.2.18pre17aa1/
directory too but I have to maintain a separate version of it against clean
2.2.18pre17 due silly rejects that I can't avoid)

        ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre17/VM-global-2.2.18pre17-7.bz2

(the way I could reproduce the hang with 2.2.18pre17aa1 is been while testing
LVM snapshotting because while a LV is under snapshot [as also while using
raid5] WRITEA will block too)

Vanilla 2.2.18pre17 can reproduce such bug one order of magnitude more easily
since it blocks there all the time, and I had to partly change that blocking
behaviour in my tree for performance reasons. That's why people reported that
VM-global patch "cured" the problem. But really it had a small window for that
bug too.

So now I ported the strict fix to 2.2.18pre17 clean. It's untested but I'm
almost sure it will fix the problem there too.

--- 2.2.18pre17/fs/buffer.c.~1~ Tue Sep 5 02:28:47 2000
+++ 2.2.18pre17/fs/buffer.c Wed Oct 25 04:38:34 2000
@@ -1468,10 +1468,13 @@
 #define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected))
 #define buffer_busy(bh) ((bh)->b_count || ((bh)->b_state & BUFFER_BUSY_BITS))
 
-static int sync_page_buffers(struct buffer_head *bh, int wait)
+static int sync_page_buffers(struct page * page, int wait)
 {
+ struct buffer_head * bh = page->buffers;
         struct buffer_head * tmp = bh;
 
+ page->buffers = NULL;
+
         do {
                 struct buffer_head *p = tmp;
                 tmp = tmp->b_this_page;
@@ -1482,6 +1485,8 @@
                         ll_rw_block(WRITE, 1, &p);
         } while (tmp != bh);
 
+ page->buffers = bh;
+
         do {
                 struct buffer_head *p = tmp;
                 tmp = tmp->b_this_page;
@@ -1533,7 +1538,7 @@
  busy:
         too_many = (nr_buffers * bdf_prm.b_un.nfract/100);
 
- if (!sync_page_buffers(bh, wait)) {
+ if (!sync_page_buffers(page_map, wait)) {
 
                 /* If a high percentage of the buffers are dirty,
                  * wake kflushd

The above strict version of the fix is downloadable from here too:

        ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre17/strict-VM-corruption-fix-1

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:15 EST