Re: [PATCH 3/5] writeback: dirty rate control

From: Peter Zijlstra
Date: Tue Aug 09 2011 - 10:58:14 EST


On Sat, 2011-08-06 at 16:44 +0800, Wu Fengguang wrote:
>
> Estimation of balanced bdi->dirty_ratelimit
> ===========================================
>
> When started N dd, throttle each dd at
>
> task_ratelimit = pos_bw (any non-zero initial value is OK)

This is (0), since it makes (1). But it fails to explain what the
difference is between task_ratelimit and pos_bw (and why positional
bandwidth is a good name).

> After 200ms, we got
>
> dirty_bw = # of pages dirtied by app / 200ms
> write_bw = # of pages written to disk / 200ms

Right, so that I get. And our premise for the whole work is to delay
applications so that we match the dirty_bw to the write_bw, right?

> For aggressive dirtiers, the equality holds
>
> dirty_bw == N * task_ratelimit
> == N * pos_bw (1)

So dirty_bw is in pages/s, so task_ratelimit should also be in pages/s,
since N is a unit-less number.

What does task_ratelimit in pages/s mean? Since we make the tasks sleep
the only thing we can make from this is a measure of pages. So I expect
(in a later patch) we compute the sleep time on the amount of pages we
want written out, using this ratelimit measure, right?

> The balanced throttle bandwidth can be estimated by
>
> ref_bw = pos_bw * write_bw / dirty_bw (2)

Here you introduce reference bandwidth, what does it mean and what is
its relation to positional bandwidth. Going by the equation, we got
(pages/s * pages/s) / (pages/s) so we indeed have a bandwidth unit.

write_bw/dirty_bw is the ration between output and input of dirty pages,
but what is pos_bw and what does that make ref_bw?

> >From (1) and (2), we get equality
>
> ref_bw == write_bw / N (3)

Somehow this seems like the primary postulate, yet you present it like a
derivation. The whole purpose of your control system is to provide this
fairness between processes, therefore I would expect you start out with
this postulate and reason therefrom.

> If the N dd's are all throttled at ref_bw, the dirty/writeback rates
> will match. So ref_bw is the balanced dirty rate.

Which does lead to the question why its not called that instead ;-)

> In practice, the ref_bw calculated by (2) may fluctuate and have
> estimation errors. So the bdi->dirty_ratelimit update policy is to
> follow it only when both pos_bw and ref_bw point to the same direction
> (indicating not only the dirty position has deviated from the global/bdi
> setpoints, but also it's still departing away).

Which is where you introduce the need for pos_bw, yet you have not yet
explained its meaning. In this explanation you allude to it being the
speed (first time derivative) of the deviation from the setpoint.

The set point's measure is in pages, so the measure of its first time
derivative would indeed be pages/s, just like bandwidth, but calling it
a bandwidth seems highly confusing indeed.

I would also like a few more words on your update condition, why did you
pick those, and what are the full ramifications of them.

Also missing in this story is your pos_ratio thing, it is used in the
code, but there is no explanation on how it ties in with the above
things.


You seem very skilled in control systems (your earlier read-ahead work
was also a very complex system), but the explanations of your systems
are highly confusing. Can you go back to the roots and explain how you
constructed your model and why you did so? (without using graphs please)


PS. I'm not criticizing your work, the results are impressive (as
always), but I find it very hard to understand.

PPS. If it would help, feel free to refer me to educational material on
control system theory, either online or in books.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/