Re: [PATCH 00/23] per device dirty throttling -v8

From: david
Date: Sat Aug 04 2007 - 03:47:21 EST


On Sat, 4 Aug 2007, Ingo Molnar wrote:

* Ingo Molnar <mingo@xxxxxxx> wrote:

There are positive reports in the never-ending "my system crawls like
an XT when copying large files" bugzilla entry:

http://bugzilla.kernel.org/show_bug.cgi?id=7372

i forgot this entry:

" We recently upgraded our office to gigabit Ethernet and got some big
AMD64 / 3ware boxes for file and vmware servers... only to find them
almost useless under any kind of real load. I've built some patched
2.6.21.6 kernels (using the bdi throttling patch you mentioned) to
see if our various Debian Etch boxes run better. So far my testing
shows a *great* improvement over the stock Debian 2.6.18 kernel on
our configurations. "

and bdi has been in -mm in the past i think, so we also know (to a
certain degree) that it does not hurt those workloads that are fine
either.

[ my personal interest in this is the following regression: every time i
start a large kernel build with DEBUG_INFO on a quad-core 4GB RAM box,
i get up to 30 seconds complete pauses in Vim (and most other tasks),
during plain editing of the source code. (which happens when Vim tries
to write() to its swap/undo-file.) ]

I have an issue that sounds like it's related.

I've got a syslog server that's got two Opteron 246 cpu's, 16G ram, 2x140G 15k rpm drives (fusion MPT hardware mirroring), 16x500G 7200rpm SATA drives on 3ware 9500 cards (software raid6) running 2.6.20.3 with hz set at default and preempt turned off.

I have syslog doing buffered writes to the SCSI drives and every 5 min a cron job copies the data to the raid array.

I've found that if I do anything significant on the large raid array that the system looses a significant amount of the UDP syslog traffic, even though there should be pleanty of ram and cpu (and the spindles involved in the writes are not being touched), even a grep can cause up to 40% losses in the syslog traffic. I've experimented with nice levels (nicing down the grep and nicing up the syslogd) without a noticable effect on the losses.

I've been planning to try a new kernel with hz=1000 to see if that would help, and after that experiment with the various preempt settings, but it sounds like the per-device queues may actually be more relavent to the problem.

what would you suggest I test, and in what order and combination?

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/