Re: [PATCH 1/7] Fix mem_cgroup_hierarchical_reclaim() to do stablehierarchy walk.

From: Michal Hocko
Date: Wed Jun 22 2011 - 11:15:32 EST


On Thu 16-06-11 12:51:41, KAMEZAWA Hiroyuki wrote:
[...]
> @@ -1667,41 +1668,28 @@ static int mem_cgroup_hierarchical_recla
> if (!check_soft && root_mem->memsw_is_minimum)
> noswap = true;
>
> - while (1) {
> +again:
> + if (!shrink) {
> + visit = 0;
> + for_each_mem_cgroup_tree(victim, root_mem)
> + visit++;
> + } else {
> + /*
> + * At shrinking, we check the usage again in caller side.
> + * so, visit children one by one.
> + */
> + visit = 1;
> + }
> + /*
> + * We are not draining per cpu cached charges during soft limit reclaim
> + * because global reclaim doesn't care about charges. It tries to free
> + * some memory and charges will not give any.
> + */
> + if (!check_soft)
> + drain_all_stock_async(root_mem);
> +
> + while (visit--) {

This is racy, isn't it? What prevents some groups to disapear in the
meantime? We would reclaim from those that are left more that we want.

Why cannot we simply do something like (totally untested):

Index: linus_tree/mm/memcontrol.c
===================================================================
--- linus_tree.orig/mm/memcontrol.c 2011-06-22 17:11:54.000000000 +0200
+++ linus_tree/mm/memcontrol.c 2011-06-22 17:13:05.000000000 +0200
@@ -1652,7 +1652,7 @@ static int mem_cgroup_hierarchical_recla
unsigned long reclaim_options,
unsigned long *total_scanned)
{
- struct mem_cgroup *victim;
+ struct mem_cgroup *victim, *first_victim = NULL;
int ret, total = 0;
int loop = 0;
bool noswap = reclaim_options & MEM_CGROUP_RECLAIM_NOSWAP;
@@ -1669,6 +1669,11 @@ static int mem_cgroup_hierarchical_recla

while (1) {
victim = mem_cgroup_select_victim(root_mem);
+ if (!first_victim)
+ first_victim = victim;
+ else if (first_victim == victim)
+ break;
+
if (victim == root_mem) {
loop++;
/*
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/