Re: [PATCH] mm, memcg: Optionally disable memcg by default using Kconfig

From: Johannes Weiner
Date: Tue May 19 2015 - 10:18:30 EST


CC'ing Tejun and cgroups for the generic cgroup interface part

On Tue, May 19, 2015 at 11:40:57AM +0100, Mel Gorman wrote:
> memcg was reported years ago to have significant overhead when unused. It
> has improved but it's still the case that users that have no knowledge of
> memcg pay a performance penalty.
>
> This patch adds a Kconfig that controls whether memcg is enabled by default
> and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
> oldconfig will get the historical behaviour. It is not an option for most
> distributions to simply disable MEMCG as there are users that require it
> but they should also be knowledgable enough to use cgroup_enable=.
>
> This was evaluated using aim9, a page fault microbenchmark and ebizzy
> but I'll focus on the page fault microbenchmark. It can be reproduced
> using pft from mmtests (https://github.com/gormanm/mmtests). Edit
> configs/config-global-dhp__pagealloc-performance and update MMTESTS to
> only contain pft. This is the relevant part of the profile summary
>
> /usr/src/linux-4.0-vanilla/mm/memcontrol.c 6.6441 395842
> mem_cgroup_try_charge 2.950% 175781

Ouch. Do you have a way to get the per-instruction breakdown of this?
This function really isn't doing much. I'll try to reproduce it here
too, I haven't seen such high costs with pft in the past.

> __mem_cgroup_count_vm_event 1.431% 85239
> mem_cgroup_page_lruvec 0.456% 27156
> mem_cgroup_commit_charge 0.392% 23342
> uncharge_list 0.323% 19256
> mem_cgroup_update_lru_size 0.278% 16538
> memcg_check_events 0.216% 12858
> mem_cgroup_charge_statistics.isra.22 0.188% 11172
> try_charge 0.150% 8928
> commit_charge 0.141% 8388
> get_mem_cgroup_from_mm 0.121% 7184
>
> It's showing 6.64% overhead in memcontrol.c when no memcgs are in
> use. Applying the patch and disabling memcg reduces this to 0.48%

The frustrating part is that 4.5% of that is not even coming from the
main accounting and tracking work. I'm looking into getting this
fixed regardless of what happens with this patch.

> /usr/src/linux-4.0-nomemcg-v1r1/mm/memcontrol.c 0.4834 27511
> mem_cgroup_page_lruvec 0.161% 9172
> mem_cgroup_update_lru_size 0.154% 8794
> mem_cgroup_try_charge 0.126% 7194
> mem_cgroup_commit_charge 0.041% 2351
>
> Note that it's not very visible from headline performance figures
>
> pft faults
> 4.0.0 4.0.0
> vanilla nomemcg-v1
> Hmean faults/cpu-1 1443258.1051 ( 0.00%) 1530574.6033 ( 6.05%)
> Hmean faults/cpu-3 1340385.9270 ( 0.00%) 1375156.5834 ( 2.59%)
> Hmean faults/cpu-5 875599.0222 ( 0.00%) 876217.9211 ( 0.07%)
> Hmean faults/cpu-7 601146.6726 ( 0.00%) 599068.4360 ( -0.35%)
> Hmean faults/cpu-8 510728.2754 ( 0.00%) 509887.9960 ( -0.16%)
> Hmean faults/sec-1 1432084.7845 ( 0.00%) 1518566.3541 ( 6.04%)
> Hmean faults/sec-3 3943818.1437 ( 0.00%) 4036918.0217 ( 2.36%)
> Hmean faults/sec-5 3877573.5867 ( 0.00%) 3922745.9207 ( 1.16%)
> Hmean faults/sec-7 3991832.0418 ( 0.00%) 3990670.8481 ( -0.03%)
> Hmean faults/sec-8 3987189.8167 ( 0.00%) 3978842.8107 ( -0.21%)
>
> Low thread counts get a boost but it's within noise as memcg overhead does
> not dominate. It's not obvious at all at higher thread counts as other
> factors cause more problems. The overall breakdown of CPU usage looks like
>
> 4.0.0 4.0.0
> vanilla nomemcg-v1
> User 41.45 41.11
> System 410.19 404.76
> Elapsed 130.33 126.30
>
> Despite the relative unimportance, there is at least some justification
> for disabling memcg by default.

I guess so. The only thing I don't like about this is that it changes
the default of a single controller. While there is some justification
from an overhead standpoint, it's a little weird in terms of interface
when you boot, say, a distribution kernel and it has cgroups with all
but one resource controller available.

Would it make more sense to provide a Kconfig option that disables all
resource controllers per default? There is still value in having only
the generic cgroup part for grouped process monitoring and control.

Thanks,
Johannes

> Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
> ---
> Documentation/kernel-parameters.txt | 4 ++++
> init/Kconfig | 15 +++++++++++++++
> kernel/cgroup.c | 20 ++++++++++++++++----
> mm/memcontrol.c | 3 +++
> 4 files changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index bfcb1a62a7b4..4f264f906816 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -591,6 +591,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> cut the overhead, others just disable the usage. So
> only cgroup_disable=memory is actually worthy}
>
> + cgroup_enable= [KNL] Enable a particular controller
> + Similar to cgroup_disable except that it enables
> + controllers that are disabled by default.
> +
> checkreqprot [SELINUX] Set initial checkreqprot flag value.
> Format: { "0" | "1" }
> See security/selinux/Kconfig help text.
> diff --git a/init/Kconfig b/init/Kconfig
> index f5dbc6d4261b..819b6cc05cba 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -990,6 +990,21 @@ config MEMCG
> Provides a memory resource controller that manages both anonymous
> memory and page cache. (See Documentation/cgroups/memory.txt)
>
> +config MEMCG_DEFAULT_ENABLED
> + bool "Automatically enable memory resource controller"
> + default y
> + depends on MEMCG
> + help
> + The memory controller has some overhead even if idle as resource
> + usage must be tracked in case a group is created and a process
> + migrated. As users may not be aware of this and the cgroup_disable=
> + option, this config option controls whether it is enabled by
> + default. It is assumed that someone that requires the controller
> + can find the cgroup_enable= switch.
> +
> + Say N if unsure. This is default Y to preserve oldconfig and
> + historical behaviour.
> +
> config MEMCG_SWAP
> bool "Memory Resource Controller Swap Extension"
> depends on MEMCG && SWAP
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 29a7b2cc593e..0e79db55bf1a 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -5370,7 +5370,7 @@ out_free:
> kfree(pathbuf);
> }
>
> -static int __init cgroup_disable(char *str)
> +static int __init __cgroup_set_state(char *str, bool disabled)
> {
> struct cgroup_subsys *ss;
> char *token;
> @@ -5382,16 +5382,28 @@ static int __init cgroup_disable(char *str)
>
> for_each_subsys(ss, i) {
> if (!strcmp(token, ss->name)) {
> - ss->disabled = 1;
> - printk(KERN_INFO "Disabling %s control group"
> - " subsystem\n", ss->name);
> + ss->disabled = disabled;
> + printk(KERN_INFO "Setting %s control group"
> + " subsystem %s\n", ss->name,
> + disabled ? "disabled" : "enabled");
> break;
> }
> }
> }
> return 1;
> }
> +
> +static int __init cgroup_disable(char *str)
> +{
> + return __cgroup_set_state(str, true);
> +}
> +
> +static int __init cgroup_enable(char *str)
> +{
> + return __cgroup_set_state(str, false);
> +}
> __setup("cgroup_disable=", cgroup_disable);
> +__setup("cgroup_enable=", cgroup_enable);
>
> static int __init cgroup_set_legacy_files_on_dfl(char *str)
> {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b34ef4a32a3b..ce171ba16949 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5391,6 +5391,9 @@ struct cgroup_subsys memory_cgrp_subsys = {
> .dfl_cftypes = memory_files,
> .legacy_cftypes = mem_cgroup_legacy_files,
> .early_init = 0,
> +#ifndef CONFIG_MEMCG_DEFAULT_ENABLED
> + .disabled = 1,
> +#endif
> };
>
> /**
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/