Re: [PATCH v3 1/1] Randomized slab caches for kmalloc()

From: GONG, Ruiqi
Date: Sun Jun 25 2023 - 07:25:43 EST




On 2023/06/22 2:21, Kees Cook wrote:
> On Fri, Jun 16, 2023 at 07:18:43PM +0800, GONG, Ruiqi wrote:
>> [...]
>>
>> Signed-off-by: GONG, Ruiqi <gongruiqi@xxxxxxxxxxxxxxx>
>> Co-developed-by: Xiu Jianfeng <xiujianfeng@xxxxxxxxxx>
>> Signed-off-by: Xiu Jianfeng <xiujianfeng@xxxxxxxxxx>
>
> I think this looks really good. Thanks for the respin! Some
> nits/comments/questions below, but I think this can land and get
> incrementally improved. Please consider it:
>
> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
>

Thanks, Kees!

>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index 791f7453a04f..b7a5387f0dad 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -19,6 +19,9 @@
>> #include <linux/workqueue.h>
>> #include <linux/percpu-refcount.h>
>>
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +#include <linux/hash.h>
>> +#endif
>
> I think this can just be included unconditionally, yes?
>

True. Will change it.

>> [...]
>> +extern unsigned long random_kmalloc_seed;
>> +
>> +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
>> {
>> /*
>> * The most common case is KMALLOC_NORMAL, so test for it
>> * with a single branch for all the relevant flags.
>> */
>> if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed,
>> + CONFIG_RANDOM_KMALLOC_CACHES_BITS);
>> +#else
>> return KMALLOC_NORMAL;
>> +#endif
>
> The commit log talks about having no runtime lookup, but that's not
> entirely true, given this routine. And xor and a hash_64... I wonder how
> expensive this is compared to some kind of constant expression that
> could be computed at build time... (the xor should stay, but that's
> "cheap").
>

To be precise, currently the random selection is static during each time
the system starts and runs, but not across different system startups. In
the commit log, I've added the following paragraph to explain this
feature, and will expand it a bit in the next version:

"Meanwhile, the static random selection is further enhanced with a
per-boot random seed, which prevents the attacker from finding a usable
kmalloc that happens to pick the same cache with the vulnerable
subsystem/module by analyzing the open source code."

As for the build-time hashing, I think theoretically it could be
achieved, as long as we can have a compile-time random number generator.
However afaik the compiler has no support on this at the moment, and I
can only find a few discussions about this (in the C++ community).

>>
>> /*
>> * At least one of the flags has to be set. Their priorities in
>> @@ -577,7 +589,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>>
>> index = kmalloc_index(size);
>> return kmalloc_trace(
>> - kmalloc_caches[kmalloc_type(flags)][index],
>> + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>> flags, size);
>> }
>> return __kmalloc(size, flags);
>> @@ -593,7 +605,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
>>
>> index = kmalloc_index(size);
>> return kmalloc_node_trace(
>> - kmalloc_caches[kmalloc_type(flags)][index],
>> + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>> flags, node, size);
>> }
>> return __kmalloc_node(size, flags, node);
>
> The use of _RET_IP_ is generally fine here, but I wonder about some of
> the allocation wrappers (like devm_kmalloc(), etc). I think those aren't
> being bucketed correctly? Have you checked that?
>

Yes, I checked the distribution of used slab caches by booting the
kernel in QEMU, and /proc/slabinfo shows that they are in general evenly
spread among the copies.

I think in most cases, hashing on _RET_IP_ can effectively diverge
allocation in different subsystems/modules into different caches. For
example, using devm_kmalloc() on different places will acquire slab obj
on different cache copies:

xxx_func () {
devm_kmalloc() {
------------ always inlined alloc_dr() ---------------
__kmalloc_node_track_caller(..., _RET_IP_)
--------------------------------------------------------
}
next inst. of devm_kmalloc() // where _RET_IP_ takes
}

There are cases like sk_alloc(), where the wrapping is deep and all
struct sock would gather into a few caches:

sk_alloc() {
sk_prot_alloc() {
------------ always inlined kmalloc() -----------------
kmalloc_trace(... kmalloc_type(flags, _RET_IP_) ...) // A
__kmalloc(...) {
__do_kmalloc_node(..., _RET_IP_) // B
}
--------------------------------------------------------
next inst. of kmalloc() // where B takes
}
next inst. of sk_prot_alloc() // where A takes
}

But it's still better than nothing. Currently _RET_IP_ is the best
option I can think of, and in general it works.

>> [...]
>> @@ -776,12 +781,44 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
>> #define KMALLOC_RCL_NAME(sz)
>> #endif
>>
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b
>> +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz)
>> +#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 1
>> +#define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 0] = "kmalloc-random-01-" #sz,
>
> I wonder if this name is getting too long? Should "random" be "rnd" ?
> *shrug*
>

Okay. Will do.

>> [...]
>> +#define KMA_RAND_16(sz) KMA_RAND_15(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz,
>
> And if we wanted to save another character, this could be numbered 0-f,
> but I defer these aesthetics to Vlastimil. :)

Same with me ;)

>
> -Kees
>