RE: [PATCH 1/2] x86/random: Retry on RDSEED failure

From: Reshetova, Elena
Date: Wed Feb 14 2024 - 13:07:34 EST


Hi Elena,
>
> On Wed, Feb 14, 2024 at 4:18 PM Reshetova, Elena <elena.reshetova@xxxxxxxxx>
> wrote:
> > "The RdRand in a non-defective device is designed to be faster than the bus,
> > so when a core accesses the output from the DRNG, it will always get a
> > random number.
> > As a result, it is hard to envision a scenario where the RdRand, on a fully
> > functional device, will underflow.
> > The carry flag after RdRand signals an underflow so in the case of a defective chip,
> > this will prevent the code thinking it has a random number when it does not.
>
> That's really great news, especially combined with a very similar
> statement from Borislav about AMD chips:
>
> On Fri, Feb 9, 2024 at 10:45 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
> > Yeah, I know exactly what you mean and I won't go into details for
> > obvious reasons. Two things:
> >
> > * Starting with Zen3, provided properly configured hw RDRAND will never
> > fail. It is also fair when feeding the different contexts.
>
> I assume that this faster-than-the-bus-ness also takes into account the
> various accesses required to even switch contexts when scheduling VMs,
> so your proposed host-guest scheduling attack can't really happen
> either. Correct?

Yes, this attack wont be possible for rdrand, so we are good.

>
> One clarifying question in all of this: what is the point of the "try 10
> times" advice? Is the "faster than the bus" statement actually "faster
> than the bus if you try 10 times"? Or is the "10 times" advice just old
> and not relevant.

The whitepaper should clarify this more in the future, but in short
10 times retry is not relevant based on the above statement.
"when core accesses the output from the DRNG, it will always get a
random number" - there are no statements of re-try here.

>
> In other words, is the following a reasonable patch?
>
> diff --git a/arch/x86/include/asm/archrandom.h
> b/arch/x86/include/asm/archrandom.h
> index 02bae8e0758b..2d5bf5aa9774 100644
> --- a/arch/x86/include/asm/archrandom.h
> +++ b/arch/x86/include/asm/archrandom.h
> @@ -13,22 +13,16 @@
> #include <asm/processor.h>
> #include <asm/cpufeature.h>
>
> -#define RDRAND_RETRY_LOOPS 10
> -
> /* Unconditional execution of RDRAND and RDSEED */
>
> static inline bool __must_check rdrand_long(unsigned long *v)
> {
> bool ok;
> - unsigned int retry = RDRAND_RETRY_LOOPS;
> - do {
> - asm volatile("rdrand %[out]"
> - CC_SET(c)
> - : CC_OUT(c) (ok), [out] "=r" (*v));
> - if (ok)
> - return true;
> - } while (--retry);
> - return false;
> + asm volatile("rdrand %[out]"
> + CC_SET(c)
> + : CC_OUT(c) (ok), [out] "=r" (*v));
> + WARN_ON(!ok);
> + return ok;
> }

Do you intend this as a generic rdrand change or also a fix for CoCo
case problem? I personally don’t like WARN_ON from security
pov, but I know I am in minority with this.

>
> static inline bool __must_check rdseed_long(unsigned long *v)
>
> (As for the RDSEED clarification, that also matches Borislav's reply, is
> what we expected and knew experimentally, and doesn't really have any
> bearing on Linux's RNG or this discussion, since RDRAND is all we need
> anyway.)

Agree. Just wanted to have it also included for the overall picture.

>
> Regards,
> Jason