Re: 3.13-rc breaks MEMCG_SWAP

From: Michal Hocko
Date: Mon Dec 16 2013 - 05:40:53 EST


On Mon 16-12-13 10:53:45, Michal Hocko wrote:
> On Mon 16-12-13 17:36:09, Li Zefan wrote:
> > On 2013/12/16 16:36, Hugh Dickins wrote:
> > > CONFIG_MEMCG_SWAP is broken in 3.13-rc. Try something like this:
> > >
> > > mkdir -p /tmp/tmpfs /tmp/memcg
> > > mount -t tmpfs -o size=1G tmpfs /tmp/tmpfs
> > > mount -t cgroup -o memory memcg /tmp/memcg
> > > mkdir /tmp/memcg/old
> > > echo 512M >/tmp/memcg/old/memory.limit_in_bytes
> > > echo $$ >/tmp/memcg/old/tasks
> > > cp /dev/zero /tmp/tmpfs/zero 2>/dev/null
> > > echo $$ >/tmp/memcg/tasks
> > > rmdir /tmp/memcg/old
> > > sleep 1 # let rmdir work complete
> > > mkdir /tmp/memcg/new
> > > umount /tmp/tmpfs
> > > dmesg | grep WARNING
> > > rmdir /tmp/memcg/new
> > > umount /tmp/memcg
> > >
> > > Shows lots of WARNING: CPU: 1 PID: 1006 at kernel/res_counter.c:91
> > > res_counter_uncharge_locked+0x1f/0x2f()
> > >
> > > Breakage comes from 34c00c319ce7 ("memcg: convert to use cgroup id").
> > >
> > > The lifetime of a cgroup id is different from the lifetime of the
> > > css id it replaced: memsw's css_get()s do nothing to hold on to the
> > > old cgroup id, it soon gets recycled to a new cgroup, which then
> > > mysteriously inherits the old's swap, without any charge for it.
> > > (I thought memsw's particular need had been discussed and was
> > > well understood when 34c00c319ce7 went in, but apparently not.)
> > >
> > > The right thing to do at this stage would be to revert that and its
> > > associated commits; but I imagine to do so would be unwelcome to
> > > the cgroup guys, going against their general direction; and I've
> > > no idea how embedded that css_id removal has become by now.
> > >
> > > Perhaps some creative refcounting can rescue memsw while still
> > > using cgroup id?
> > >
> >
> > Sorry for the broken.
> >
> > I think we can keep the cgroup->id until the last css reference is
> > dropped and the css is scheduled to be destroyed.
>
> How would this work? The task which pushed the memory to the swap is
> still alive (living in a different group) and the swap will be there
> after the last reference to css as well.

Or did you mean to get css reference in swap_cgroup_record and release
it in __mem_cgroup_try_charge_swapin?

That would prevent the warning (assuming idr_remove would move to
css_free[1]) but I am not sure this is the right thing to do. memsw charges
will be accounted to the parent already (assuming there is one) without
anybody to uncharge them because all uncharges would fallback to the
root memcg after css_offline.

Hugh's approach seems much better.

---
[1] Is this even possible? I cannot say I would understand the comment
above idr_remove in cgroup_destroy_css_killed 100% but it suggests we
cannot postpone it to later.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/