Re: [PATCH 1/2] x86/random: Retry on RDSEED failure

From: Jason A. Donenfeld
Date: Wed Jan 31 2024 - 13:01:27 EST


On Wed, Jan 31, 2024 at 6:37 PM Reshetova, Elena
<elena.reshetova@xxxxxxxxx> wrote:
>
>
>
> > On Wed, Jan 31, 2024 at 03:45:06PM +0100, Jason A. Donenfeld wrote:
> > > On Wed, Jan 31, 2024 at 09:07:56AM -0500, Theodore Ts'o wrote:
> > > > What about simply treating boot-time initialization of the /dev/random
> > > > state as special. That is, on x86, if the hardware promises that
> > > > RDSEED or RDRAND is available, we use them to initialization our RNG
> > > > state at boot. On bare metal, there can't be anyone else trying to
> > > > exhaust the on-chip RNG's entropy supply, so if RDSEED or RDRAND
> > > > aren't working available --- panic, since the hardware is clearly
> > > > busted.
> > >
> > > This is the first thing I suggested here:
> > https://lore.kernel.org/all/CAHmME9qsfOdOEHHw_MOBmt6YAtncbbqP9LPK2dRjuO
> > p1CrHzRA@xxxxxxxxxxxxxx/
> > >
> > > But Elena found this dissatisfying because we still can't guarantee new
> > > material later.
> >
> > Right, but this is good enough that modulo in-kernel RNG state
> > compromise, or the ability to attack the underlying cryptographic
> > primitives (in which case we have much bigger vulnerabilities than
> > this largely theoretical one), even if we don't have new material
> > later, the in-kernel RNG for the CC VM should be sufficiently
> > trustworthy for government work.
>
> I agree, this is probably the best we can do at the moment.
> I did want to point out the runtime need of fresh entropy also, but
> as we discussed in this thread we might not be able to get it
> without introducing a DoS path for the userspace.
> In this case, it is the best to only loose the forward prediction property
> vs. the whole Linux RNG.

So if this is what we're congealing around, I guess we can:

0) Leave RDSEED alone and focus on RDRAND.
1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND
(and simply hope this doesn't get exploited for guest-guest boot DoS).
2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments
and variable naming making it clear that this is a hardware bug
workaround, not a "feature" added for "extra security".
3) Complain loudly to Intel and get them to fix the hardware.

Though, a large part of me would really like to skip that step (2),
first because it's a pretty gross bandaid that adds lots of
complexity, and second because it'll make (3) less poignant.

Jason