Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement

From: Michal Hocko
Date: Wed Nov 29 2017 - 07:17:50 EST


On Tue 28-11-17 14:00:23, Kemi Wang wrote:
> The existed implementation of NUMA counters is per logical CPU along with
> zone->vm_numa_stat[] separated by zone, plus a global numa counter array
> vm_numa_stat[]. However, unlike the other vmstat counters, numa stats don't
> effect system's decision and are only read from /proc and /sys, it is a
> slow path operation and likely tolerate higher overhead. Additionally,
> usually nodes only have a single zone, except for node 0. And there isn't
> really any use where you need these hits counts separated by zone.
>
> Therefore, we can migrate the implementation of numa stats from per-zone to
> per-node, and get rid of these global numa counters. It's good enough to
> keep everything in a per cpu ptr of type u64, and sum them up when need, as
> suggested by Andi Kleen. That's helpful for code cleanup and enhancement
> (e.g. save more than 130+ lines code).

I agree. Having these stats per zone is a bit of overcomplication. The
only consumer is /proc/zoneinfo and I would argue this doesn't justify
the additional complexity. Who does really need to know per zone broken
out numbers?

Anyway, I haven't checked your implementation too deeply but why don't
you simply define static percpu array for each numa node?
[...]
> +extern u64 __percpu *vm_numa_stat;
[...]
> +#ifdef CONFIG_NUMA
> + size = sizeof(u64) * num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS;
> + align = __alignof__(u64[num_possible_nodes() * NR_VM_NUMA_STAT_ITEMS]);
> + vm_numa_stat = (u64 __percpu *)__alloc_percpu(size, align);
> +#endif
--
Michal Hocko
SUSE Labs