Re: [PATCH 1/2] x86/random: Retry on RDSEED failure

From: Jason A. Donenfeld
Date: Tue Jan 30 2024 - 09:06:38 EST


On Tue, Jan 30, 2024 at 2:10 PM Reshetova, Elena
<elena.reshetova@xxxxxxxxx> wrote:
> The internals of Intel DRBG behind RDRAND/RDSEED has been publicly
> documented, so the structure is no secret. Please see [1] for overall
> structure and other aspects. So, yes, your overall understanding is correct
> (there are many more details though).

Indeed, have read it.

> > So maybe this patch #1 (of 2) can be dropped?
>
> Before we start debating this patchset, what is your opinion on the original
> problem we raised for CoCo VMs when both RDRAND/RDSEED are made to
> fail deliberately?

My general feeling is that this seems like a hardware problem.

If you have a VM, the hypervisor should provide a seed. But with CoCo,
you can't trust the host to do that. But can't the host do anything to
the VM that it wants, like fiddle with its memory? No, there are
special new hardware features to encrypt and protect ram to prevent
this. So if you've found yourself in a situation where you absolutely
cannot trust the host, AND the hardware already has working guest
protections from the host, then it would seem you also need a hardware
solution to handle seeding. And you're claiming that RDRAND/RDSEED is
the *only* hardware solution available for it.

Is that an accurate summary? If it is, then the actual problem is that
the hardware provided to solve this problem doesn't actually solve it
that well, so we're caught deciding between guest-guest DoS (some
other guest on the system uses all RDRAND resources) and cryptographic
failure because of a malicious host creating a deterministic
environment.

But I have two questions:

1) Is this CoCo VM stuff even real? Is protecting guests from hosts
actually possible in the end? Is anybody doing this? I assume they
are, so maybe ignore this question, but I would like to register my
gut feeling that on the Intel platform this seems like an endless
whack-a-mole problem like SGX.

2) Can a malicious host *actually* create a fully deterministic
environment? One that'll produce the same timing for the jitter
entropy creation, and all the other timers and interrupts and things?
I imagine the attestation part of CoCo means these VMs need to run on
real Intel silicon and so it can't be single stepped in TCG or
something, right? So is this problem actually a real one? And to what
degree? Any good experimental research on this?

Either way, if you're convinced RDRAND is the *only* way here, adding
a `WARN_ON(is_in_early_boot)` to the RDRAND (but not RDSEED) failure
path seems a fairly lightweight bandaid. I just wonder if the hardware
people could come up with something more reliable that we wouldn't
have to agonize over in the kernel.

Jason