Re: regression in page writeback

From: Wu Fengguang
Date: Tue Sep 22 2009 - 04:05:31 EST

Next message: Andreas Mohr: "Re: [PATCH] PCI PM: Read device power state from register afterupdating it (rev. 2)"
Previous message: Ingo Molnar: "[origin tree build failure] drivers/built-in.o:(.data+0xb1f40):undefined reference to `dib0070_ctrl_agc_filter'"
In reply to: Peter Zijlstra: "Re: regression in page writeback"
Next in thread: Peter Zijlstra: "Re: regression in page writeback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Sep 22, 2009 at 02:40:12PM +0800, Peter Zijlstra wrote:
> On Tue, 2009-09-22 at 13:49 +0800, Shaohua Li wrote:
> > Hi,
> > Commit d7831a0bdf06b9f722b947bb0c205ff7d77cebd8 causes disk io regression
> > in my test.
> > My system has 12 disks, each disk has two partitions. System runs fio sequence
> > write on all partitions, each partion has 8 jobs.
> > 2.6.31-rc1, fio gives 460m/s disk io
> > 2.6.31-rc2, fio gives about 400m/s disk io. Revert the patch, speed back to
> > 460m/s
> >
> > Under latest git: fio gives 450m/s disk io; If reverting the patch, the speed
> > is 484m/s.
> >
> > With the patch, fio reports less io merge and more interrupts. My naive
> > analysis is the patch makes balance_dirty_pages_ratelimited_nr() limits
> > write chunk to 8 pages and then soon go to sleep in balance_dirty_pages(),
> > because most time the bdi_nr_reclaimable < bdi_thresh, and so when write
> > the pages out, the chunk is 8 pages long instead of 4M long. Without the patch,
> > thread can write 8 pages and then move some pages to writeback, and then
> > continue doing write. The patch seems to break this.
> >
> > Unfortunatelly I can't figure out a fix for this issue, hopefully you have more
> > ideas.
>
> This whole writeback business is very fragile,

Agreed, sorry..

> the patch does indeed cure a few cases and compounds a few other
> cases, typical trade off.
>
> People are looking at it.

Staring at the changelog, I don't think balance_dirty_pages() could
"overshoot its limits and move all the dirty pages to writeback".
Because it will break when enough pages are written:

if (pages_written >= write_chunk)
break; /* We've done our duty */

The observed "overshooting" may well be the background_writeout()
behavior, which will hit the dirty numbers all the way down to 0.

mm: prevent balance_dirty_pages() from doing too much work

balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.

balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.

This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.

I'm not sure how this patch stopped the "overshooting" behavior.
Maybe it managed to not start the background pdflush, or the started
pdflush thread exited because it found writeback is in progress by
someone else?

- if (bdi_nr_reclaimable) {
+ if (bdi_nr_reclaimable > bdi_thresh) {

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andreas Mohr: "Re: [PATCH] PCI PM: Read device power state from register afterupdating it (rev. 2)"
Previous message: Ingo Molnar: "[origin tree build failure] drivers/built-in.o:(.data+0xb1f40):undefined reference to `dib0070_ctrl_agc_filter'"
In reply to: Peter Zijlstra: "Re: regression in page writeback"
Next in thread: Peter Zijlstra: "Re: regression in page writeback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]