Re: vm_dirty_ratio seems a bit large.

From: Andrew Morton
Date: Thu Mar 17 2005 - 16:32:54 EST


Robin Holt <holt@xxxxxxx> wrote:
>
> Andrew,
>
> We have some fairly large installations that are running into long
> pauses while running fsync(). One of the issues that was noted is the
> vm_dirty_ratio, while probably adequate for a desktop type installation,
> seems excessively large for a larger configuration. For your reference,
> the machine that first reported this is running with 384GB of memory.
> Others that reported the problem range from 256GB to 4TB. At those sizes,
> we are talking dirty buffers in the range of 100GB to 1TB. That seems
> a bit excessive.

I'd have thought that dirty_background_ratio is the problem here: you want
pdflush to kick in earlier to start the I/O while permitting the write()ing
application to keep running.

> Is there any chance of limiting vm_dirty_ratio to something other than
> a hard-coded 40%? Maybe add something like the following two lines to
> the beginning of page_writeback_init(). This would limit us to roughly
> 2GB of dirty buffers. I picked that number assuming that nobody would
> want to affect machines in the 4GB and below range.
>
>
> vm_dirty_ratio = min(40, TWO_GB_IN_PAGES / total_pages * 100);
> dirty_background_ratio = vm_dirty_ratio / 4;

All that dirty pagecache allows us to completely elide I/O when overwrites
are happening, to get better request queue merging, to get better file
layout if the fs does allocate-on-flush and, probably most importantly, to
avoid I/O completely for short-lived files.

So I'm sure there's someone out there who will say "hey, how come by
seeky-writing application just got 75% slower?".

That being said, perhaps reducing the default will help more people than it
hurts - I simply do not know. That's why it's tuneable ;)

Would it be correct to assume that these applications are simply doing
large, linear writes? If so, do they write quickly or at a relatively slow
rate? The latter, I assume.

Which fs are you using?

Other things we can think about are

- Setting the dirty limit on a per-inode basis (non-trivial)

- Adding a new fadvise command to start async writeback of a section of
the file (easy).

>
> One other issue we have is the vm_dirty_ratio and background_ratio
> adjustments are a little coarse with these memory sizes. Since our
> minimum adjustment is 1%, we are adjusting by 40GB on the largest
> configuration from above. The hardware we are shipping today is capable
> of going to far greater amounts of memory, but we don't have customers
> demanding that yet. I would like to plan ahead for that and change
> vm_dirty_ratio from a straight percent into a millipercent (thousandth
> of a percent). Would that type of change be acceptable?

Oh drat. I think such a change would require a new set of /proc entries.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/