Re: RFC - how to balance Dirty+Writeback in the face of slowwriteback.

From: Andrew Morton
Date: Fri Aug 18 2006 - 02:27:14 EST

On Fri, 18 Aug 2006 10:11:02 +1000
David Chinner <dgc@xxxxxxx> wrote:

> > Something like that covers the global dirty+writeback problem. The other
> > major problem space is the multiple-backing-device problem:
> >
> > a) One device is being written to heavily, another lightly
> >
> > b) One device is fast, another is slow.
> Once we are past the throttling threshold, the only thing that
> matters is whether we can write more data to the backing device(s).
> We should not realy be allowing the input rate to exceed the output
> rate one we are passed the throttle threshold.


But it seems really sad to block some process which is doing a really small
dirtying (say, some dopey atime update) just because some other process is
doing a huge write.

Now, things _usually_ work out all right, if only because of
balance_dirty_pages_ratelimited()'s logic. But it's more by happenstance
than by intent, and these sorts of interferences can happen.

> > To solve this properly we'd need to account for
> > dirty+writeback(+unstable?) pages on a per-backing-dev basis.
> We'd still need to account for them globally because we still need
> to be able to globally limit the amount of dirty data in the
> machine.
> FYI, I implemented a complex two-stage throttle on Irix a couple of
> years ago - it uses a per-device soft throttle threshold that is not
> enforced until the global dirty state passes a configurable limit.
> At that point, the per-device limits are enforced.
> This meant that devices with no dirty state attached to them could
> continue to dirty pages up to their soft-threshold, whereas heavy
> writers would be stopped until their backing devices fell back below
> the soft thresholds.
> Because the amount of dirty pages could continue to grow past safe
> limits if you had enough devices, there is also a global hard limit
> that cannot be exceeded and this throttles all incoming write
> requests regardless of the state of the device it was being written
> to.
> The problem with this approach is that the code was complex and
> difficult to test properly. Also, working out the default config
> values was an exercise in trial, error, workload measurement and
> guesswork that took some time to get right.
> The current linux code works as well as that two-stage throttle
> (better in some cases!) because of one main thing - bound request
> queue depth with feedback into the throttling control loop. Irix
> has neither of these so the throttle had to provide this accounting
> and limiting (soft throttle threshold).
> Hence I'm not sure that per-backing-device accounting and making
> decisions based on that accounting is really going to buy us much
> apart from additional complexity....

hm, interesting.

It seems that the many-writers-to-different-disks workloads don't happen
very often. We know this because

a) The 2.4 performance is utterly awful, and I never saw anybody
complain and

b) 2.6 has the risk of filling all memory with under-writeback pages,
and nobdy has complained about that either (iirc).

Relying on that observation and the request-queue limits has got us this
far but yeah, we should plug that PageWriteback windup scenario.

btw, Neil, has the Pagewriteback windup actually been demonstrated? If so,
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at