Re: [PATCH v4 14/36] lib: add allocation tagging support for memory allocation profiling

From: Kent Overstreet
Date: Wed Feb 21 2024 - 18:29:42 EST


On Wed, Feb 21, 2024 at 03:05:32PM -0800, Kees Cook wrote:
> On Wed, Feb 21, 2024 at 11:40:27AM -0800, Suren Baghdasaryan wrote:
> > [...]
> > +struct alloc_tag {
> > + struct codetag ct;
> > + struct alloc_tag_counters __percpu *counters;
> > +} __aligned(8);
> > [...]
> > +#define DEFINE_ALLOC_TAG(_alloc_tag) \
> > + static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
> > + static struct alloc_tag _alloc_tag __used __aligned(8) \
> > + __section("alloc_tags") = { \
> > + .ct = CODE_TAG_INIT, \
> > + .counters = &_alloc_tag_cntr };
> > [...]
> > +static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
> > +{
> > + swap(current->alloc_tag, tag);
> > + return tag;
> > +}
>
> Future security hardening improvement idea based on this infrastructure:
> it should be possible to implement per-allocation-site kmem caches. For
> example, we could create:
>
> struct alloc_details {
> u32 flags;
> union {
> u32 size; /* not valid after __init completes */
> struct kmem_cache *cache;
> };
> };
>
> - add struct alloc_details to struct alloc_tag
> - move the tags section into .ro_after_init
> - extend alloc_hooks() to populate flags and size:
> .flags = __builtin_constant_p(size) ? KMALLOC_ALLOCATE_FIXED
> : KMALLOC_ALLOCATE_BUCKETS;
> .size = __builtin_constant_p(size) ? size : SIZE_MAX;
> - during kernel start or module init, walk the alloc_tag list
> and create either a fixed-size kmem_cache or to allocate a
> full set of kmalloc-buckets, and update the "cache" member.
> - adjust kmalloc core routines to use current->alloc_tag->cache instead
> of using the global buckets.
>
> This would get us fully separated allocations, producing better than
> type-based levels of granularity, exceeding what we have currently with
> CONFIG_RANDOM_KMALLOC_CACHES.
>
> Does this look possible, or am I misunderstanding something in the
> infrastructure being created here?

Definitely possible, but... would we want this? That would produce a
_lot_ of kmem caches, and don't we already try to collapse those where
possible to reduce internal fragmentation?