Re: [Bug #12604] Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB

From: Nick Piggin
Date: Wed Feb 11 2009 - 22:34:38 EST


On Wed, Feb 11, 2009 at 06:02:19PM -0800, Linus Torvalds wrote:
>
>
> On Thu, 12 Feb 2009, Nick Piggin wrote:
>
> > On Tue, Feb 10, 2009 at 05:28:30PM +0100, Jan Kara wrote:
> > > On Sun 08-02-09 20:21:42, Rafael J. Wysocki wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.28. Please verify if it still should be listed and let me know
> > > > (either way).
> > > Yes, I've verified with latest git and the regression is still there.
> >
> > I'm working on this FWIW...
>
> Shouldn't we just revert it? The code does look to be broken.
>
> It also looks like the interaction with that ever-buggy "nr_to_write"
> thing are totally wrong. I can see that whole
>
> if (!cycled) {
> ..
> index = 0;
> goto retry
> }
>
> doing all the wrong things: if we ever hit the combination of
> "!cycled + nr_to-write==0", we're always screwed. It simply _cannot_ do
> the right thing.

Doh, I think you spotted the bug. I was going off on the wrong track.


> I dunno. That whole piece-of-sh*t function has been incredibly buggy this
> release. The code is an unreadable mess, and I think that "cyclic" stuff
> is part of the reason for it being messy and buggy. Please convince me
> it's worth saving, or let me revert the whole stinking pile of crud?

Well... the cyclic stuff apparently gets used by some filesystems to do
data integrity stuff, so I think the problem addressed by my broken
commit maybe shouldn't be ignored.

Maybe you're being harsh on this function. It has been two thinko bugs
so far. Before this release cycle it seemed to have the record for the
most data integrity bugs in a single function...

How's this? Jan, can you test with the bdb workload please?

--

A bug was introduced into write_cache_pages cyclic writeout by commit
31a12666d8f0c22235297e1c1575f82061480029. The intention (and comments)
is that we should cycle back and look for more dirty pages at the
beginning of the file if there is no more work to be done.

But the !done condition was dropped from the test. This means that any
time the page writeout loop breaks (eg. due to nr_to_write == 0), we
will set index to 0, then goto again. This will set done_index to index,
then find done is set, so will proceed to the end of the function. When
updating mapping->writeback_index for cyclic writeout, we now use
done_index == 0, so we're always cycling back to 0.

This seemed to be causing random mmap writes (slapadd and iozone) to
start writing more pages from the LRU and writeout would slowdown.

With this patch, iozone random write performance is increased nearly
5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2).

Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
---
Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c 2009-02-12 13:30:42.000000000 +1100
+++ linux-2.6/mm/page-writeback.c 2009-02-12 14:16:28.000000000 +1100
@@ -1079,7 +1079,7 @@ continue_unlock:
pagevec_release(&pvec);
cond_resched();
}
- if (!cycled) {
+ if (!cycled && !done) {
/*
* range_cyclic:
* We hit the last page and there is more work to be done: wrap
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/