Re: [PATCH v8 05/17] mm: Assign memcg-aware shrinkers bitmap to memcg

From: Andrew Morton
Date: Tue Jul 03 2018 - 16:50:11 EST


On Tue, 03 Jul 2018 18:09:26 +0300 Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:

> Imagine a big node with many cpus, memory cgroups and containers.
> Let we have 200 containers, every container has 10 mounts,
> and 10 cgroups. All container tasks don't touch foreign
> containers mounts. If there is intensive pages write,
> and global reclaim happens, a writing task has to iterate
> over all memcgs to shrink slab, before it's able to go
> to shrink_page_list().
>
> Iteration over all the memcg slabs is very expensive:
> the task has to visit 200 * 10 = 2000 shrinkers
> for every memcg, and since there are 2000 memcgs,
> the total calls are 2000 * 2000 = 4000000.
>
> So, the shrinker makes 4 million do_shrink_slab() calls
> just to try to isolate SWAP_CLUSTER_MAX pages in one
> of the actively writing memcg via shrink_page_list().
> I've observed a node spending almost 100% in kernel,
> making useless iteration over already shrinked slab.
>
> This patch adds bitmap of memcg-aware shrinkers to memcg.
> The size of the bitmap depends on bitmap_nr_ids, and during
> memcg life it's maintained to be enough to fit bitmap_nr_ids
> shrinkers. Every bit in the map is related to corresponding
> shrinker id.
>
> Next patches will maintain set bit only for really charged
> memcg. This will allow shrink_slab() to increase its
> performance in significant way. See the last patch for
> the numbers.
>
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -182,6 +182,11 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker)
> if (id < 0)
> goto unlock;
>
> + if (memcg_expand_shrinker_maps(id)) {
> + idr_remove(&shrinker_idr, id);
> + goto unlock;
> + }
> +
> if (id >= shrinker_nr_max)
> shrinker_nr_max = id + 1;
> shrinker->id = id;

This function ends up being a rather sad little thing.

: static int prealloc_memcg_shrinker(struct shrinker *shrinker)
: {
: int id, ret = -ENOMEM;
:
: down_write(&shrinker_rwsem);
: id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
: if (id < 0)
: goto unlock;
:
: if (memcg_expand_shrinker_maps(id)) {
: idr_remove(&shrinker_idr, id);
: goto unlock;
: }
:
: if (id >= shrinker_nr_max)
: shrinker_nr_max = id + 1;
: shrinker->id = id;
: ret = 0;
: unlock:
: up_write(&shrinker_rwsem);
: return ret;
: }

- there's no need to call memcg_expand_shrinker_maps() unless id >=
shrinker_nr_max so why not move the code and avoid calling
memcg_expand_shrinker_maps() in most cases.

- why aren't we decreasing shrinker_nr_max in
unregister_memcg_shrinker()? That's easy to do, avoids pointless
work in shrink_slab_memcg() and avoids memory waste in future
prealloc_memcg_shrinker() calls.

It should be possible to find the highest ID in an IDR tree with a
straightforward descent of the underlying radix tree, but I doubt if
that has been wired up. Otherwise a simple loop in
unregister_memcg_shrinker() would be needed.