Re: [PATCH 1/2] x86/random: Retry on RDSEED failure

From: Jason A. Donenfeld
Date: Wed Feb 14 2024 - 14:32:56 EST


Hi Elena,

On Wed, Feb 14, 2024 at 05:59:48PM +0000, Reshetova, Elena wrote:
> >
> > In other words, is the following a reasonable patch?
> >
> > diff --git a/arch/x86/include/asm/archrandom.h
> > b/arch/x86/include/asm/archrandom.h
> > index 02bae8e0758b..2d5bf5aa9774 100644
> > --- a/arch/x86/include/asm/archrandom.h
> > +++ b/arch/x86/include/asm/archrandom.h
> > @@ -13,22 +13,16 @@
> > #include <asm/processor.h>
> > #include <asm/cpufeature.h>
> >
> > -#define RDRAND_RETRY_LOOPS 10
> > -
> > /* Unconditional execution of RDRAND and RDSEED */
> >
> > static inline bool __must_check rdrand_long(unsigned long *v)
> > {
> > bool ok;
> > - unsigned int retry = RDRAND_RETRY_LOOPS;
> > - do {
> > - asm volatile("rdrand %[out]"
> > - CC_SET(c)
> > - : CC_OUT(c) (ok), [out] "=r" (*v));
> > - if (ok)
> > - return true;
> > - } while (--retry);
> > - return false;
> > + asm volatile("rdrand %[out]"
> > + CC_SET(c)
> > + : CC_OUT(c) (ok), [out] "=r" (*v));
> > + WARN_ON(!ok);
> > + return ok;
> > }
>
> Do you intend this as a generic rdrand change or also a fix for CoCo
> case problem?

I was thinking generic, since in all cases, RDRAND failing points to a
hardware bug in the CPU ITSELF (!), which is solid grounds for a WARN().

> I personally don’t like WARN_ON from security
> pov, but I know I am in minority with this.

I share the same opinion as you, that WARN_ON() is a little weak and we
should BUG_ON() or panic() or whatever, but I also know that this ship
has really sailed long ago, that in lots of ways Linus is also right
that BUG() is bad and shouldn't be used for much, and this just isn't a
hill to die on. And the "panic_on_warn" flag exists and "security guides"
sometimes say to turn this on, etc, so I think WARN_ON() remains the
practical compromise that won't get everyone's feathers ruffelled up.


By the way, there is still one question lingering in the back of my
mind, but I don't know if answering it would divulge confidential
implementation details.

You said that RDRAND is faster than the bus, so failures won't be
observable, while RDSEED is not because it requires collecting entropy
from the ether which is slow. That makes intuitive sense on a certain
dumb simplistic level: AES is just an algorithm so is fast, while
entropy collection is a more physical thing so is slow. But if you read
the implementation details, RDRAND is supposed to reseed after 511
calls. So what's to stop you from exhausting RDSEED in one place, while
also getting RDRAND to the end of its 511 calls, and *then* having your
victim make the subsequent RDRAND call, which tries to reseed (or is in
progress of doing so), finds that RDSEED is out of batteries, and
underflows? What's the magic detail that makes this scenario not
possible?

Jason