RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

From: Reshetova, Elena
Date: Tue Apr 16 2019 - 12:53:10 EST


> So a couple of comments; I wasn't able to find the full context for
> this patch, and looking over the thread on kernel-hardening from late
> February still left me confused exactly what attacks this would help
> us protect against (since this isn't my area and I didn't take the
> time to read all of the links to slide decks, etc.)
>
> So I'm not going to comment on the utility of this patch, but just on
> the random number generator issues. If you're only going to be using
> the low 8 bits of the output of get_prandom_u32(), even if two
> adjacent calls to get_prandom_u32() (for which only the low 8 bits are
> revealed) can be used to precisely identify which set of 2**120
> potential prandom states could have generate that pair of states, it's
> still going to take a lot of calls before you'd be able to figure out
> the prandom's internal state.
>
> It seems though the assumption that we're assuming the attacker has
> arbitrary ability to get the low bits of the stack, so *if* that's
> true, then eventually, you'd be able to get enough samples that you
> could reverse engineer the prandom state. This could take long enough
> that the process will have gotten rescheduled to another CPU, and
> since the prandom state is per-cpu, that adds another wrinkle.

Well, yes, this is also my feeling that it is going to be hard to do, but can we get
some more concrete numbers of this? We can forget about per-cpu rescheduling
for simplicity and just calculate how many calls it would take to recover the state
given that each call leaks 5 bits.

I can try to make the calculation of this based on my limited knowledge of crypto,
but I will have to read papers on this PRNG first, etc., so I was just checking if
people already have a feeling on it given how common this generator is in kenrel.

>
> > > So the argument against using TSC directly was that it might be easy to
> > > guess most of the TSC bits in timing attack. But IIRC there is fairly
> > > solid evidence that the lowest TSC bits are very hard to guess and might
> > > in fact be a very good random source.
> > >
> > > So what one could do, is for each invocation mix in the low (2?) bits of
> > > the TSC into a per-cpu/task PRNG state. By always adding some fresh
> > > entropy it would become very hard indeed to predict the outcome, even
> > > for otherwise 'trivial' PRNGs.
> >
> > You could just feed 8 bits of TSC into a CRC. Or even xor the
> > entire TSC over a CRC state and then cycle it at least 6 bits.
> > Probably doesn't matter which CRC - but you may want one that is
> > cheap in software. Even a 16bit CRC might be enough.
>
> Do we only care about x86 in this discussion? Given "x86/entry/64",
> I'm guessing the answer we're not trying to worry about how to protect
> other architectures, like say ARM, that don't have a TSC?

Well, this patch is for x86 only, but other arch might want to have similar
functionality I guess...

>
> If we do care about architectures w/o a TSC, how much cost are we
> willing to pay as far as system call overhead is concerned?

Good question, I don't know exact answer on what is acceptable overhead
for syscall for such a feature, but it should be very light to be useful, otherwise
the config would never be turned on.

>
> If it's x86 specific, maybe the simplest thing to do is to use RDRAND
> if it exists, and fall back to something involving a TSC and maybe
> prandom_u32 (assuming on how bad you think the stack leak is going to
> be) if RDRAND isn't available?
>

RDRAND is way too slow, so it is out. That's why we were looking into other
options for the fast randomness. Unfortunately it looks like we don't have
that many options in kernel.
rdtsc was the original candidate (its low bits) and Peter Zijlstra pointed a paper
that claimed that it has good source of randomness:

http://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.html

But I don't have enough knowledge on this to make a good judgment.
Original grsecurity patch used low bits of rdtsc for in-stack random offset.

Best Regards,
Elena