On Fri 09-11-12 18:23:07, Sha Zhengju wrote:On 11/09/2012 12:25 AM, Michal Hocko wrote:[...]On Thu 08-11-12 23:52:47, Sha Zhengju wrote:Ohh, I probably wasn't clear enough. I didn't suggest cumulativeWe haven't used those complicate hierarchy yet, but it sounds a good+ for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) {This is nice and simple I am just thinking whether it is enough. Say
+ long long val = 0;
+ if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account)
+ continue;
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_read_stat(mi, i);
+ printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i], K(val));
+ }
+
+ for (i = 0; i< NR_LRU_LISTS; i++) {
+ unsigned long long val = 0;
+
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_nr_lru_pages(mi, BIT(i));
+ printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i], K(val));
+ }
+ printk(KERN_CONT "\n");
that you have a deeper hierarchy and the there is a safety limit in the
its root
A (limit)
/|\
B C D
|\
E F
and we trigger an OOM on the A's limit. Now we know that something blew
up but what it was we do not know. Wouldn't it be better to swap the for
and for_each_mem_cgroup_tree loops? Then we would see the whole
hierarchy and can potentially point at the group which doesn't behave.
Memory cgroup stats for A/: ...
Memory cgroup stats for A/B/: ...
Memory cgroup stats for A/C/: ...
Memory cgroup stats for A/D/: ...
Memory cgroup stats for A/D/E/: ...
Memory cgroup stats for A/D/F/: ...
Would it still fit in with your use case?
[...]
suggestion. :)
Hierarchy is a little complex to use from our experience, and the
three cgroups involved in memcg oom can be different: memcg of
invoker, killed task, memcg of going over limit.Suppose a process in
B triggers oom and a victim in root A is selected to be killed, we
may as well want to know memcg stats just local in A cgroup(excludes
BCD). So besides hierarchy info, does it acceptable to also print
the local root node stats which as I did in the V1
version(https://lkml.org/lkml/2012/7/30/179).
numbers. Only per group. So it would be something like:
for_each_mem_cgroup_tree(mi, memcg) {
printk("Memory cgroup stats for %s", memcg_name);
for (i = 0; i< MEM_CGROUP_STAT_NSTATS; i++) {
if (i == MEM_CGROUP_STAT_SWAP&& !do_swap_account)
continue;
printk(KERN_CONT "%s:%lldKB ", mem_cgroup_stat_names[i],
K(mem_cgroup_read_stat(mi, i)));
}
for (i = 0; i< NR_LRU_LISTS; i++)
printk(KERN_CONT "%s:%lluKB ", mem_cgroup_lru_names[i],
K(mem_cgroup_nr_lru_pages(mi, BIT(i))));
printk(KERN_CONT"\n");
}
Another one I'm hesitating is numa stats, it seems the output isNUMA stats are basically per node - per zone LRU data and that the
beginning to get more and more....
for(NR_LRU_LISTS) can be easily extended to cover that.