Re: RFC - how to balance Dirty+Writeback in the face of slowwriteback.

From: Andrew Morton
Date: Fri Aug 18 2006 - 02:27:14 EST


On Fri, 18 Aug 2006 10:11:02 +1000
David Chinner <dgc@xxxxxxx> wrote:

>
> > Something like that covers the global dirty+writeback problem. The other
> > major problem space is the multiple-backing-device problem:
> >
> > a) One device is being written to heavily, another lightly
> >
> > b) One device is fast, another is slow.
>
> Once we are past the throttling threshold, the only thing that
> matters is whether we can write more data to the backing device(s).
> We should not realy be allowing the input rate to exceed the output
> rate one we are passed the throttle threshold.

True.

But it seems really sad to block some process which is doing a really small
dirtying (say, some dopey atime update) just because some other process is
doing a huge write.

Now, things _usually_ work out all right, if only because of
balance_dirty_pages_ratelimited()'s logic. But it's more by happenstance
than by intent, and these sorts of interferences can happen.

> > To solve this properly we'd need to account for
> > dirty+writeback(+unstable?) pages on a per-backing-dev basis.
>
> We'd still need to account for them globally because we still need
> to be able to globally limit the amount of dirty data in the
> machine.
>
> FYI, I implemented a complex two-stage throttle on Irix a couple of
> years ago - it uses a per-device soft throttle threshold that is not
> enforced until the global dirty state passes a configurable limit.
> At that point, the per-device limits are enforced.
>
> This meant that devices with no dirty state attached to them could
> continue to dirty pages up to their soft-threshold, whereas heavy
> writers would be stopped until their backing devices fell back below
> the soft thresholds.
>
> Because the amount of dirty pages could continue to grow past safe
> limits if you had enough devices, there is also a global hard limit
> that cannot be exceeded and this throttles all incoming write
> requests regardless of the state of the device it was being written
> to.
>
> The problem with this approach is that the code was complex and
> difficult to test properly. Also, working out the default config
> values was an exercise in trial, error, workload measurement and
> guesswork that took some time to get right.
>
> The current linux code works as well as that two-stage throttle
> (better in some cases!) because of one main thing - bound request
> queue depth with feedback into the throttling control loop. Irix
> has neither of these so the throttle had to provide this accounting
> and limiting (soft throttle threshold).
>
> Hence I'm not sure that per-backing-device accounting and making
> decisions based on that accounting is really going to buy us much
> apart from additional complexity....
>

hm, interesting.

It seems that the many-writers-to-different-disks workloads don't happen
very often. We know this because

a) The 2.4 performance is utterly awful, and I never saw anybody
complain and

b) 2.6 has the risk of filling all memory with under-writeback pages,
and nobdy has complained about that either (iirc).

Relying on that observation and the request-queue limits has got us this
far but yeah, we should plug that PageWriteback windup scenario.

btw, Neil, has the Pagewriteback windup actually been demonstrated? If so,
how?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/