Re: cgroup blkio bug/feedback

From: Vivek Goyal
Date: Thu Oct 13 2011 - 12:02:20 EST


[ Please don't top post. Respond inline ]

On Thu, Oct 13, 2011 at 05:37:33PM +0200, krzf83@xxxxxxxxx wrote:
> Rsync iops limiting thing was that I've tried limiting when rsync-ing
> from /dev/sdc (mounted as /ssd) to /home/ssd-copy (/home is /dev/md2).
> During that usage I've encountred overloads and system unresponsivness
> even greater than when not using limiting at all.

Ok, so you have your /home on md target and rsyncing from ssd to home and
hence trying to limit the impact of writes on /home by limiting write
rate on /home disk.

What's the file system you are using on /home ? I will try to do
something similar on local system and see if I can reproduce the
issue.

>
> I've also tried to limit iops for every "normal" user (not deamon
> running users) in the system for /home (/dev/md2). I've writen script
> that initialy assings pids to cgroups and initializes cgrulesengd so
> spawned apllications in the future will be in proper croups. I've
> encountred system overloads (hard reboot required) every 5-20 hours.
> That is also when I specifilcy did not limit tasks that were spawned
> by webserver (which are fastcgi php tasks and some passenger tasks).

So if you just put processes in a blkio cgroup but not specify any
limits, load average is fine? It is only when you specify some limits
load average goes up?

I am still scratching my head that how does that happen. Is it that
some application is forking more processes if sufficient IO is not
making progress due to throttling or what.

>
> Anyway as for my other tests with blkio memory limits
> (memory.limit_in_bytes)

A minor clarification. memory.limit_in_bytes is provided by memory controller
and not by blkio controller.

> I also got huge system overloads when tasks
> were killed. However this were probably due to websever spawning those
> again and again imideatly (mainly phusion passenger tasks). I've tried
> separating process-es that were spawned by webserver to other, not
> limited, cgroup, but as I recall (I've done it about 1,5 month ago)
> something were also causing overloads and constatant
> kill/respawn/kill/respawn in my production webserver.

Looks like you need to give more memory to this cgroup.

>
> As for blkio blkio.weight this would be fine thing, however it causes
> loadavg to spike like hell when limiting one process.

Are you using CFQ on your md raid component disks? What's the mdraid
configuraiton. Again, I might give it a shot here. Have not seen
anything like what you are explaining.

When this load average increases, can you capture "vmstat 2" output.
I am also curious to know who is forking off these extra processes
in the system. (may be some "ps" can help).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/