[PATCH v3 00/35] Memory allocation profiling

From: Suren Baghdasaryan
Date: Mon Feb 12 2024 - 16:40:03 EST


Memory allocation, v3 and final:

Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for debug
kernels, overhead low enough to be deployed in production.

We're aiming to get this in the next merge window, for 6.9. The feedback
we've gotten has been that even out of tree this patchset has already
been useful, and there's a significant amount of other work gated on the
code tagging functionality included in this patchset [2].

Example output:
root@moria-kvm:~# sort -h /proc/allocinfo|tail
3.11MiB 2850 fs/ext4/super.c:1408 module:ext4 func:ext4_alloc_inode
3.52MiB 225 kernel/fork.c:356 module:fork func:alloc_thread_stack_node
3.75MiB 960 mm/page_ext.c:270 module:page_ext func:alloc_page_ext
4.00MiB 2 mm/khugepaged.c:893 module:khugepaged func:hpage_collapse_alloc_folio
10.5MiB 168 block/blk-mq.c:3421 module:blk_mq func:blk_mq_alloc_rqs
14.0MiB 3594 include/linux/gfp.h:295 module:filemap func:folio_alloc_noprof
26.8MiB 6856 include/linux/gfp.h:295 module:memory func:folio_alloc_noprof
64.5MiB 98315 fs/xfs/xfs_rmap_item.c:147 module:xfs func:xfs_rui_init
98.7MiB 25264 include/linux/gfp.h:295 module:readahead func:folio_alloc_noprof
125MiB 7357 mm/slub.c:2201 module:slub func:alloc_slab_page

Since v2:
- tglx noticed a circular header dependency between sched.h and percpu.h;
a bunch of header cleanups were merged into 6.8 to ameliorate this [3].

- a number of improvements, moving alloc_hooks() annotations to the
correct place for better tracking (mempool), and bugfixes.

- looked at alternate hooking methods.
There were suggestions on alternate methods (compiler attribute,
trampolines), but they wouldn't have made the patchset any cleaner
(we still need to have different function versions for accounting vs. no
accounting to control at which point in a call chain the accounting
happens), and they would have added a dependency on toolchain
support.

Usage:
kconfig options:
- CONFIG_MEM_ALLOC_PROFILING
- CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
- CONFIG_MEM_ALLOC_PROFILING_DEBUG
adds warnings for allocations that weren't accounted because of a
missing annotation

sysctl:
/proc/sys/vm/mem_profiling

Runtime info:
/proc/allocinfo

Notes:

[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:

kmalloc pgalloc
(1 baseline) 6.764s 16.902s
(2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
(3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
(4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
(5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)

Memory overhead:
Kernel size:

text data bss dec diff
(1) 26515311 18890222 17018880 62424413
(2) 26524728 19423818 16740352 62688898 264485
(3) 26524724 19423818 16740352 62688894 264481
(4) 26524728 19423818 16740352 62688898 264485
(5) 26541782 18964374 16957440 62463596 39183

Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)

Total overhead is 0.2% of total memory.

[2]: Improved fault injection is the big one; the alloc_hooks() macro
this patchset introduces is also used for per-callsite fault injection
points in the dynamic fault injection patchset, which means we can
easily do fault injection on a per module or per file basis; this makes
it much easier to integrate memory fault injection into existing tests.

Vlastimil recently raised concerns about exposing GFP_NOWAIT as a
PF_MEMALLOC_* flag, as this might introduce GFP_NOWAIT to allocation
paths that have never had their failure paths tested - this is something
we need to address.

[3]: The circular dependency looks to be unavoidable; the issue is that
alloc_tag_save() -> current -> get_current() requires percpu.h, and
percpu.h requires sched.h because of course it does. But this doesn't
actually cause build errors because we're only using macros, so the main
concern is just not leaving a difficult-to-disentangle minefield for
later.
So, sched.h is now pretty close to being a types only header that
imports types and declares types - this is the header cleanups that were
merged for 6.8.


Kent Overstreet (11):
lib/string_helpers: Add flags param to string_get_size()
scripts/kallysms: Always include __start and __stop symbols
fs: Convert alloc_inode_sb() to a macro
mm/slub: Mark slab_free_freelist_hook() __always_inline
mempool: Hook up to memory allocation profiling
xfs: Memory allocation profiling fixups
mm: percpu: Introduce pcpuobj_ext
mm: percpu: Add codetag reference into pcpuobj_ext
mm: vmalloc: Enable memory allocation profiling
rhashtable: Plumb through alloc tag
MAINTAINERS: Add entries for code tagging and memory allocation
profiling

Suren Baghdasaryan (24):
mm: enumerate all gfp flags
mm: introduce slabobj_ext to support slab object extensions
mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
creation
mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache
objects
slab: objext: introduce objext_flags as extension to
page_memcg_data_flags
lib: code tagging framework
lib: code tagging module support
lib: prevent module unloading if memory is not freed
lib: add allocation tagging support for memory allocation profiling
lib: introduce support for page allocation tagging
mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation
tags
change alloc_pages name in dma_map_ops to avoid name conflicts
mm: enable page allocation tagging
mm: create new codetag references during page splitting
mm/page_ext: enable early_page_ext when
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
lib: add codetag reference into slabobj_ext
mm/slab: add allocation accounting into slab allocation and free paths
mm/slab: enable slab allocation tagging for kmalloc and friends
mm: percpu: enable per-cpu allocation tagging
lib: add memory allocations report in show_mem()
codetag: debug: skip objext checking when it's for objext itself
codetag: debug: mark codetags for reserved pages as empty
codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext
allocations

Documentation/admin-guide/sysctl/vm.rst | 16 ++
Documentation/filesystems/proc.rst | 28 ++
MAINTAINERS | 16 ++
arch/alpha/kernel/pci_iommu.c | 2 +-
arch/mips/jazz/jazzdma.c | 2 +-
arch/powerpc/kernel/dma-iommu.c | 2 +-
arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
arch/powerpc/platforms/ps3/system-bus.c | 4 +-
arch/powerpc/platforms/pseries/vio.c | 2 +-
arch/x86/kernel/amd_gart_64.c | 2 +-
drivers/block/virtio_blk.c | 4 +-
drivers/gpu/drm/gud/gud_drv.c | 2 +-
drivers/iommu/dma-iommu.c | 2 +-
drivers/mmc/core/block.c | 4 +-
drivers/mtd/spi-nor/debugfs.c | 6 +-
.../ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 4 +-
drivers/parisc/ccio-dma.c | 2 +-
drivers/parisc/sba_iommu.c | 2 +-
drivers/scsi/sd.c | 8 +-
drivers/staging/media/atomisp/pci/hmm/hmm.c | 2 +-
drivers/xen/grant-dma-ops.c | 2 +-
drivers/xen/swiotlb-xen.c | 2 +-
fs/xfs/kmem.c | 4 +-
fs/xfs/kmem.h | 10 +-
include/asm-generic/codetag.lds.h | 14 +
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/alloc_tag.h | 188 +++++++++++++
include/linux/codetag.h | 83 ++++++
include/linux/dma-map-ops.h | 2 +-
include/linux/fortify-string.h | 5 +-
include/linux/fs.h | 6 +-
include/linux/gfp.h | 126 +++++----
include/linux/gfp_types.h | 101 +++++--
include/linux/memcontrol.h | 56 +++-
include/linux/mempool.h | 73 +++--
include/linux/mm.h | 8 +
include/linux/mm_types.h | 4 +-
include/linux/page_ext.h | 1 -
include/linux/pagemap.h | 9 +-
include/linux/percpu.h | 27 +-
include/linux/pgalloc_tag.h | 105 +++++++
include/linux/rhashtable-types.h | 11 +-
include/linux/sched.h | 24 ++
include/linux/slab.h | 184 +++++++------
include/linux/string.h | 4 +-
include/linux/string_helpers.h | 11 +-
include/linux/vmalloc.h | 60 +++-
init/Kconfig | 4 +
kernel/dma/mapping.c | 4 +-
kernel/kallsyms_selftest.c | 2 +-
kernel/module/main.c | 25 +-
lib/Kconfig.debug | 31 +++
lib/Makefile | 3 +
lib/alloc_tag.c | 213 +++++++++++++++
lib/codetag.c | 258 ++++++++++++++++++
lib/rhashtable.c | 52 +++-
lib/string_helpers.c | 22 +-
lib/test-string_helpers.c | 4 +-
mm/compaction.c | 7 +-
mm/filemap.c | 6 +-
mm/huge_memory.c | 2 +
mm/hugetlb.c | 8 +-
mm/kfence/core.c | 14 +-
mm/kfence/kfence.h | 4 +-
mm/memcontrol.c | 56 +---
mm/mempolicy.c | 52 ++--
mm/mempool.c | 36 +--
mm/mm_init.c | 10 +
mm/page_alloc.c | 66 +++--
mm/page_ext.c | 13 +
mm/page_owner.c | 2 +-
mm/percpu-internal.h | 26 +-
mm/percpu.c | 120 ++++----
mm/show_mem.c | 15 +
mm/slab.h | 176 ++++++++++--
mm/slab_common.c | 65 ++++-
mm/slub.c | 138 ++++++----
mm/util.c | 44 +--
mm/vmalloc.c | 88 +++---
scripts/kallsyms.c | 13 +
scripts/module.lds.S | 7 +
81 files changed, 2126 insertions(+), 695 deletions(-)
create mode 100644 include/asm-generic/codetag.lds.h
create mode 100644 include/linux/alloc_tag.h
create mode 100644 include/linux/codetag.h
create mode 100644 include/linux/pgalloc_tag.h
create mode 100644 lib/alloc_tag.c
create mode 100644 lib/codetag.c

--
2.43.0.687.g38aa6559b0-goog