Re: howto combat highly pathologic latencies on a server?

From: Hans-Peter Jansen
Date: Wed Mar 10 2010 - 20:20:51 EST


On Thursday 11 March 2010, 00:44:54 David Rees wrote:
> On Wed, Mar 10, 2010 at 9:17 AM, Hans-Peter Jansen <hpj@xxxxxxxxx> wrote:
> > While this system usually operates fine, it suffers from delays, that
> > are displayed in latencytop as: "Writing page to disk:     8425,5 ms":
> > ftp://urpla.net/lat-8.4sec.png, but we see them also in the 1.7-4.8 sec
> > range: ftp://urpla.net/lat-1.7sec.png, ftp://urpla.net/lat-2.9sec.png,
> > ftp://urpla.net/lat-4.6sec.png and ftp://urpla.net/lat-4.8sec.png.
> >
> > From other observations, this issue "feels" like it is induced by
> > single syncronisation points in the block layer, eg. if I create heavy
> > IO load on one RAID array, say resizing a VMware disk image, it can
> > take up to a minute to log in by ssh, although the ssh login does not
> > touch this area at all (different RAID arrays). Note, that the
> > latencytop snapshots above are made during normal operation, not this
> > kind of load..
> >
> > Might later kernels mitigate this problem? As this is a production
> > system, that is used 6.5 days a week, I cannot do dangerous
> > experiments, also switching to 64 bit is a problem due to the legacy
> > stuff described above... OTOH, my users suffer from this, and anything
> > helping in this respect is highly appreciated.
>
> Seems like a 2.6.32 based kernel which has per-BDI writeback and "CFQ
> low latency mode" changes might help a good deal. I know that on one
> of my bigger machines (similar in specs to yours) which has a lot of
> processes which do a decent amount of IO, latency and load average has
> gone down after going to a 2.6.32 kernel from a 2.6.31 kernel (Fedora
> 11 system).
>
> Like Chris suggested, I've also heard that using the noop IO scheduler
> can work well on Areca controllers on some kernels and workloads.
> It's worth a shot and you can even try changing it at run-time.

Yes, already done. Hopefully my users will notice.. As I've upgraded this
server and the clients only two weeks ago, calming things down has highest
priority.

Switching kernel versions in production systems is always painful, thus I
try to avoid that, but this time I already needed to roll my own kernel for
the clients due to some aufs2 vs. apparmor disharmony. That led to the loss
of the latter - I can live without apparmor, but certainly not without a
reliable layered filesystem¹.

Anyway, thanks for your suggestion and confirmation, David. It is
appreciated.

Cheers,
Pete

¹) In a way, this is my primary justification to also use Linux on the
desktops²! Install one, and get the rest (nearly) free..
http://download.opensuse.org/repositories/home:/frispete:/aufs2 and below..
²) Don't tell anybody, that I don't like the other OS ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/