Re: [RFC] arm64: syscall: Direct PRNG kstack randomization

From: Kees Cook
Date: Wed Feb 21 2024 - 01:33:49 EST


On Tue, Feb 20, 2024 at 08:02:58PM -0600, Jeremy Linton wrote:
> The existing arm64 stack randomization uses the kernel rng to acquire
> 5 bits of address space randomization. This is problematic because it
> creates non determinism in the syscall path when the rng needs to be
> generated or reseeded. This shows up as large tail latencies in some
> benchmarks and directly affects the minimum RT latencies as seen by
> cyclictest.

Some questions:

- for benchmarks, why not disable kstack randomization?
- if the existing pRNG reseeding is a problem here, why isn't it a
problem in the many other places it's used?
- I though the pRNG already did out-of-line reseeding?

> Other architectures are using timers/cycle counters for this function,
> which is sketchy from a randomization perspective because it should be
> possible to estimate this value from knowledge of the syscall return
> time, and from reading the current value of the timer/counters.

The expectation is that it would be, at best, unstable.

> So, a poor rng should be better than the cycle counter if it is hard
> to extract the stack offsets sufficiently to be able to detect the
> PRNG's period.
>
> So, we can potentially choose a 'better' or larger PRNG, going as far
> as using one of the CSPRNGs already in the kernel, but the overhead
> increases appropriately. Further, there are a few options for
> reseeding, possibly out of the syscall path, but is it even useful in
> this case?

I'd love to find a way to avoid an pRNG that could be reconstructed
given enough samples. (But perhaps this xorshift RNG resists that?)

-Kees

> Reported-by: James Yang <james.yang@xxxxxxx>
> Reported-by: Shiyou Huang <shiyou.huang@xxxxxxx>
> Signed-off-by: Jeremy Linton <jeremy.linton@xxxxxxx>
> ---
> arch/arm64/kernel/syscall.c | 55 ++++++++++++++++++++++++++++++++++++-
> 1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index 9a70d9746b66..70143cb8c7be 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -37,6 +37,59 @@ static long __invoke_syscall(struct pt_regs *regs, syscall_fn_t syscall_fn)
> return syscall_fn(regs);
> }
>
> +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> +DEFINE_PER_CPU(u32, kstackrng);
> +static u32 xorshift32(u32 state)
> +{
> + /*
> + * From top of page 4 of Marsaglia, "Xorshift RNGs"
> + * This algorithm is intended to have a period 2^32 -1
> + * And should not be used anywhere else outside of this
> + * code path.
> + */
> + state ^= state << 13;
> + state ^= state >> 17;
> + state ^= state << 5;
> + return state;
> +}
> +
> +static u16 kstack_rng(void)
> +{
> + u32 rng = raw_cpu_read(kstackrng);
> +
> + rng = xorshift32(rng);
> + raw_cpu_write(kstackrng, rng);
> + return rng & 0x1ff;
> +}
> +
> +/* Should we reseed? */
> +static int kstack_rng_setup(unsigned int cpu)
> +{
> + u32 rng_seed;
> +
> + do {
> + rng_seed = get_random_u32();
> + } while (!rng_seed);
> + raw_cpu_write(kstackrng, rng_seed);
> + return 0;
> +}
> +
> +static int kstack_init(void)
> +{
> + int ret;
> +
> + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:kstackrandomize",
> + kstack_rng_setup, NULL);
> + if (ret < 0)
> + pr_err("kstack: failed to register rng callbacks.\n");
> + return 0;
> +}
> +
> +arch_initcall(kstack_init);
> +#else
> +static u16 kstack_rng(void) { return 0; }
> +#endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
> +
> static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
> unsigned int sc_nr,
> const syscall_fn_t syscall_table[])
> @@ -66,7 +119,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
> *
> * The resulting 5 bits of entropy is seen in SP[8:4].
> */
> - choose_random_kstack_offset(get_random_u16() & 0x1FF);
> + choose_random_kstack_offset(kstack_rng());
> }
>
> static inline bool has_syscall_work(unsigned long flags)
> --
> 2.43.0
>

--
Kees Cook