Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group

From: çè
Date: Tue Jan 29 2019 - 20:54:00 EST




On 2019/1/28 äå3:21, çèé wrote:
[snip]
No offense but I'm afraid you misunderstand the problem we try to solve
by wait_sum, if your purpose is to have a way to tell whether there are
sufficient CPU inside a container, please try lxcfs + top, if there are
almost no idle and load is high, then the CPU resource is not sufficient.

emmmm... Maybe I didn't make it clear. We need to dynamically adjust the
number of CPUs for a container based on the running state of tasks inside
the container. If we find tasks' wait_sum are really high, we will add more
CPU cores to this container, or else we will decline some CPU to this container.
In a word, we want to ensure 'co-scheduling' for high priority containers.


I understand that you want to use task wait time which is a raw metric, but IMHO
when task wait more, idle will be less and load will be high, they are more
general metric to tell whether there are CPU starving rather then task wait
time on rq, and we rely on these too.

The only issue we got previously is that we don't know what caused the less idle
and high load, could be wrong resource assignment or cgroup competition, and now
with wait_sum we can firstly make sure competition is low, then if idle is still
low and load is still high inside container, time to assign more CPU.


Frankly speaking this sounds like a supplement rather than a missing piece,
although we don't rely on lxcfs and modify the kernel ourselves to support
container environment, I still don't think such kind of solutions should be
in kernel.

I don't care if this value is considered as a supplement or a missing piece. I
only care about how can I assess the running state inside a container. I think
lxcfs is really a good solution to improve the visibility of container
resources,
but it is not good enough at the moment.

/proc/cpuinfo
/proc/diskstats
/proc/meminfo
/proc/stat
/proc/swaps
/proc/uptime

we can read this procfs file inside a container,but this file still
cannot reflect
real-time information. Please think about the following scenario: a
'rabbit' process
will generate 2000 tasks in every 30ms, and these children tasks just run 1~5ms
and then exit. How can we detect this thrashing workload without
hierarchy wait_sum?

As mentioned, we implement the isolation on ourselves, so we will see the isolated
idle and load information inside container, rather then the host data, we don't
rely on lxcfs but we know it's something doing similar work, so what you got by
reading /proc/stat, does it tell the isolated idle data? And you need a isolated
/proc/loadavg too.

Anyway, IMHO this is a special requirement only for container environment, not a
general solution for kernel problem, so I would suggest either help improve
the lxcfs to make it useful for your production, or do the modification in your own
kernel.

Regards,
Michael Wang


Thanks,
Yuzhoujian