Re: Linux 2.6.29

From: Andrew Morton
Date: Thu Mar 26 2009 - 21:06:11 EST


On Thu, 26 Mar 2009 17:51:44 -0700 (PDT) Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

>
>
> On Thu, 26 Mar 2009, Linus Torvalds wrote:
> >
> > The only times tunables have worked for us is when they auto-tune.
> >
> > IOW, we don't have "use 35% of memory for buffer cache" tunables, we just
> > dynamically auto-tune memory use. And no, we don't expect user space to
> > run some "tuning program for their load" either.
>
> IOW, what we could reasonably do is something along the lines of:
>
> - start off with some reasonable value for max background dirty (per
> block device) that defaults to something sane (quite possibly based on
> simply memory size).
>
> - assume that "foreground dirty" is just always 2* background dirty.
>
> - if we hit the "max foreground dirty" during memory allocation, then we
> shrink the background dirty value (logic: we never want to have to wait
> synchronously)
>
> - if we hit some maximum latency on writeback, shrink dirty aggressively
> and based on how long the latency was (because at that point we have a
> real _measure_ of how costly it is with that load).
>
> - if we start doing background dirtying, but never hit the foreground
> dirty even in dirty balancing (ie when a writer is actually _writing_,
> as opposed to hitting it when allocating memory by a non-writer), then
> slowly open up the window - we may be limiting too early.
>
> .. add heuristics to taste. The point being, that if we do this based on
> real loads, and based on hitting the real problems, then we might actually
> be getting somewhere. In particular, if the filesystem sucks at writeout
> (ie the limiter is not the _disk_, but the filesystem serialization), then
> it should automatically also shrink the max dirty state.
>
> The tunable then could become the maximum latency we accept or something
> like that. Or the hysteresis limits/rules for the soft "grow" or "shrink"
> events. At that point, maybe we could even find something that works for
> most people.
>

hm.

It may not be too hard to account for seekiness. Simplest case: if we
dirty a page and that page is file-contiguous to another already dirty
page then don't increment the dirty page count by "1": increment it by
0.01.

Another simple case would be to keep track of the _number_ of dirty
inodes rather than simply lumping all dirty pages together.

And then there's metadata. The dirty balancing code doesn't account
for dirty inodes _at all_ at present.

(Many years ago there was a bug wherein we could have zillions of dirty
inodes and exactly zero dirty pages, and the writeback code wouldn't
trigger at all - the inodes would just sit there until a page got
dirtied - this might still be there).


Then again, perhaps we don't need all those discrete heuristic things.
Maybe it can all be done in mark_buffer_dirty(). Do some clever
math+data-structure to track the seekiness of our dirtiness. Delayed
allocation would mess that up though.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/