Re: [PATCH 2/2] mm/memcontrol: split local and nested atomic vmstats/vmevents counters

From: Konstantin Khlebnikov
Date: Thu Jul 18 2019 - 11:08:12 EST


On 17.07.2019 20:53, Johannes Weiner wrote:
On Wed, Jul 17, 2019 at 03:29:19PM +0300, Konstantin Khlebnikov wrote:
This is alternative solution for problem addressed in commit 815744d75152
("mm: memcontrol: don't batch updates of local VM stats and events").

Instead of adding second set of percpu counters which wastes memory and
slows down showing statistics in cgroup-v1 this patch use two arrays of
atomic counters: local and nested statistics.

Then update has the same amount of atomic operations: local update and
one nested for each parent cgroup. Readers of hierarchical statistics
have to sum two atomics which isn't a big deal.

All updates are still batched using one set of percpu counters.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>

Yeah that looks better. Note that it was never about the atomics,
though, but rather the number of cachelines dirtied. Your patch should
solve this problem as well, but it might be a good idea to run
will-it-scale on it to make sure the struct layout is still fine.


Looks like this patch shows 2% regression for 24 core 2 numa node
machine I have. Compete remove of these counters gives 2% boost.
Also I cannot reproduce regression fixed by commit 815744d75152 - revert
have no effect.

So, feel free to ignore second patch. I'll play with this a little more.

Maybe atomic per-numa counters could give nice balance between scalability add overhead.
Ideally this memory could be mapped in per-cpu manner to give atomic access via fs/gs.