Re: [PATCH v2 4/9] slab: Introduce kmem_buckets_create()

From: Kees Cook
Date: Mon Mar 25 2024 - 16:40:44 EST


On Mon, Mar 25, 2024 at 03:40:51PM -0400, Kent Overstreet wrote:
> On Tue, Mar 05, 2024 at 02:10:20AM -0800, Kees Cook wrote:
> > Dedicated caches are available For fixed size allocations via
> > kmem_cache_alloc(), but for dynamically sized allocations there is only
> > the global kmalloc API's set of buckets available. This means it isn't
> > possible to separate specific sets of dynamically sized allocations into
> > a separate collection of caches.
> >
> > This leads to a use-after-free exploitation weakness in the Linux
> > kernel since many heap memory spraying/grooming attacks depend on using
> > userspace-controllable dynamically sized allocations to collide with
> > fixed size allocations that end up in same cache.
> >
> > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> > against these kinds of "type confusion" attacks, including for fixed
> > same-size heap objects, we can create a complementary deterministic
> > defense for dynamically sized allocations.
> >
> > In order to isolate user-controllable sized allocations from system
> > allocations, introduce kmem_buckets_create(), which behaves like
> > kmem_cache_create(). (The next patch will introduce kmem_buckets_alloc(),
> > which behaves like kmem_cache_alloc().)
> >
> > Allows for confining allocations to a dedicated set of sized caches
> > (which have the same layout as the kmalloc caches).
> >
> > This can also be used in the future once codetag allocation annotations
> > exist to implement per-caller allocation cache isolation[1] even for
> > dynamic allocations.
> >
> > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
> > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > ---
> > Cc: Vlastimil Babka <vbabka@xxxxxxx>
> > Cc: Christoph Lameter <cl@xxxxxxxxx>
> > Cc: Pekka Enberg <penberg@xxxxxxxxxx>
> > Cc: David Rientjes <rientjes@xxxxxxxxxx>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Roman Gushchin <roman.gushchin@xxxxxxxxx>
> > Cc: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> > Cc: linux-mm@xxxxxxxxx
> > ---
> > include/linux/slab.h | 5 +++
> > mm/slab_common.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 77 insertions(+)
> >
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index f26ac9a6ef9f..058d0e3cd181 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -493,6 +493,11 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
> > gfp_t gfpflags) __assume_slab_alignment __malloc;
> > void kmem_cache_free(struct kmem_cache *s, void *objp);
> >
> > +kmem_buckets *kmem_buckets_create(const char *name, unsigned int align,
> > + slab_flags_t flags,
> > + unsigned int useroffset, unsigned int usersize,
> > + void (*ctor)(void *));
>
> I'd prefer an API that initialized an object over one that allocates it
> - that is, prefer
>
> kmem_buckets_init(kmem_buckets *bucekts, ...)

Sure, that can work. kmem_cache_init() would need to exist for the same
reason though.

>
> by forcing it to be separately allocated, you're adding a pointer deref
> to every access.

I don't understand what you mean here. "every access"? I take a guess
below...

> That would also allow for kmem_buckets to be lazily initialized, which
> would play nicely declaring the kmem_buckets in the alloc_hooks() macro.

Sure, I think it'll depend on how the per-site allocations got wired up.
I think you're meaning to include a full copy of the kmem cache/bucket
struct with the codetag instead of just a pointer? I don't think that'll
work well to make it runtime selectable, and I don't see it using an
extra deref -- allocations already get the struct from somewhere and
deref it. The only change is where to find the struct.

> I'm curious what all the arguments to kmem_buckets_create() are needed
> for, if this is supposed to be a replacement for kmalloc() users.

Are you confusing kmem_buckets_create() with kmem_buckets_alloc()? These
args are needed to initialize the per-bucket caches, just like is
already done for the global kmalloc per-bucket caches. This mirrors
kmem_cache_create(). (Or more specifically, calls kmem_cache_create()
for each bucket size, so the args need to be passed through.)

If you mean "why expose these arguments because they can just use the
existing defaults already used by the global kmalloc caches" then I
would say, it's to gain the benefit here of narrowing the scope of the
usercopy offsets. Right now kmalloc is forced to allow the full usercopy
window into an allocation, but we don't have to do this any more. For
example, see patch 8, where struct msg_msg doesn't need to expose the
header to userspace:

msg_buckets = kmem_buckets_create("msg_msg", 0, SLAB_ACCOUNT,
sizeof(struct msg_msg),
DATALEN_MSG, NULL);

Only DATALEN_MSG many bytes, starting at sizeof(struct msg_msg), will be
allowed to be copied in/out of userspace. Before, it was unbounded.

-Kees

--
Kees Cook