Re: [PATCH 1/5] mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy

From: Johannes Weiner
Date: Mon Feb 08 2016 - 00:46:41 EST


On Sun, Feb 07, 2016 at 08:27:31PM +0300, Vladimir Davydov wrote:
> Currently, in the legacy hierarchy kmem accounting is off for all
> cgroups by default and must be enabled explicitly by writing something
> to memory.kmem.limit_in_bytes. Since we don't support reclaim on hitting
> kmem limit, nor do we have any plans to implement it, this is likely to
> be -1, just to enable kmem accounting and limit kernel memory
> consumption by the memory.limit_in_bytes along with user memory.
>
> This user API was introduced when the implementation of kmem accounting
> lacked slab shrinker support and hence was useless in practice. Things
> have changed since then - slab shrinkers were made memcg aware, the
> accounting overhead seems to be negligible, and a failure to charge a
> kmem allocation should not have critical consequences, because we only
> account those kernel objects that should be safe to fail. That's why
> kmem accounting is enabled by default for all cgroups in the default
> hierarchy, which will eventually replace the legacy one.
>
> The ability to enable kmem accounting for some cgroups while keeping it
> disabled for others is getting difficult to maintain. E.g. to make
> shadow node shrinker memcg aware (see mm/workingset.c), we need to know
> the relationship between the number of shadow nodes allocated for a
> cgroup and the size of its lru list. If kmem accounting is enabled for
> all cgroups there is no problem, but what should we do if kmem
> accounting is enabled only for half of cgroups? We've no other choice
> but use global lru stats while scanning root cgroup's shadow nodes, but
> that would be wrong if kmem accounting was enabled for all cgroups
> (which is the case if the unified hierarchy is used), in which case we
> should use lru stats of the root cgroup's lruvec.
>
> That being said, let's enable kmem accounting for all memory cgroups by
> default. If one finds it unstable or too costly, it can always be
> disabled system-wide by passing cgroup.memory=nokmem to the kernel at
> boot time.
>
> Signed-off-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>

A little bolder than I would have preferred for legacy memcg, but I
don't think we have another choice here. And you're right, accounting
costs are a far cry from what they once were. So I'm okay with this.