Re: vm kills processes in our 2.3.12 port of reiserfs - what was

Andrea Arcangeli (andrea@suse.de)
Mon, 30 Aug 1999 15:21:02 +0200 (MEST)


On Sun, 29 Aug 1999, Hans Reiser wrote:

>We have a port of reiserfs to 2.3.12 in which dbench suffers from processes
>getting killed due to not being able to get memory. There were some changes to
>mark_buffer_dirty() in the 2.2 series that got dropped from the 2.3 series. The
>2.2 version, as I remember your telling me, was intended to prevent this very
>problem. What is the story on this? This problem doesn't happen with ext2
>with the same number of clients, but I don't know how much that means. Does
>ext2 do something now to avoid the issue?

I think it's because mark_dirty_buffer doesn't enforce a limit in the
grow of dirty buffers in the system.

Baically the buffer code checks if there are too much dirty buffers only
for the data writes (because data writes usually pass through
block_write_partial_page or the other equivalent filesystem-helper
functions in 2.3.x).

So if your filesystem writes 120Mbyte of metadata before the first data
write, then you can fill the buffer cache with 120mbyte of dirty buffers
without blocking the application waiting for some write-I/O completation.
This unlimited grow of unfreeable memory in cache may lead to the VM
subsystem to be not able to recycle the cache in time and so the tasks
that needs memory will be killed (as happened in the early 2.2.x).

I think ext2 doesn't trigger easily this problem because usually after
some few metadata-inode writes you'll find yourself doing also a data
write. Maybe the problem may show on ext2 as well by writing lots of
dir-entry.

Side note: currently the b_dev information was just unused in
balance_dirty() (originally it was used to enforce some more restrictive
rule in presence of loop-device-buffers, but I agree such code didn't make
too much sense and I am happy it's been removed).

I think something like this should cure your problem. Patch against 2.3.15
(untested).

diff -ur 2.3.15/fs/block_dev.c 2.3.15-balance_dirty/fs/block_dev.c
--- 2.3.15/fs/block_dev.c Tue Jun 22 19:45:40 1999
+++ 2.3.15-balance_dirty/fs/block_dev.c Mon Aug 30 14:56:06 1999
@@ -124,7 +124,6 @@
}
buffercount=0;
}
- balance_dirty(dev);
if (write_error)
break;
}
diff -ur 2.3.15/fs/buffer.c 2.3.15-balance_dirty/fs/buffer.c
--- 2.3.15/fs/buffer.c Mon Aug 23 20:15:53 1999
+++ 2.3.15-balance_dirty/fs/buffer.c Mon Aug 30 15:03:03 1999
@@ -834,7 +834,7 @@
*/
static int too_many_dirty_buffers;

-void balance_dirty(kdev_t dev)
+static void balance_dirty(void)
{
int dirty = nr_buffers_type[BUF_DIRTY];
int ndirty = bdf_prm.b_un.ndirty;
@@ -861,6 +861,7 @@
void __mark_buffer_dirty(struct buffer_head *bh, int flag)
{
__mark_dirty(bh, flag);
+ balance_dirty();
}

/*
@@ -1486,7 +1487,7 @@
if (!test_and_set_bit(BH_Dirty, &bh->b_state)) {
__mark_dirty(bh, 0);
if (too_many_dirty_buffers)
- balance_dirty(bh->b_dev);
+ balance_dirty();
}

if (err) {
@@ -1656,7 +1657,7 @@
if (!test_and_set_bit(BH_Dirty, &bh->b_state)) {
__mark_dirty(bh, 0);
if (too_many_dirty_buffers)
- balance_dirty(bh->b_dev);
+ balance_dirty();
}

if (err) {
diff -ur 2.3.15/include/linux/fs.h 2.3.15-balance_dirty/include/linux/fs.h
--- 2.3.15/include/linux/fs.h Wed Aug 25 02:09:19 1999
+++ 2.3.15-balance_dirty/include/linux/fs.h Mon Aug 30 15:01:41 1999
@@ -803,7 +803,6 @@
__mark_buffer_dirty(bh, flag);
}

-extern void balance_dirty(kdev_t);
extern int check_disk_change(kdev_t);
extern int invalidate_inodes(struct super_block *);
extern void invalidate_inode_pages(struct inode *);

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/