Re: [PATCH v2 2/2] x86/coco: Require seeding RNG with RDRAND on CoCo systems

From: Jason A. Donenfeld
Date: Thu Feb 15 2024 - 08:23:04 EST


Hi Elena,

On Thu, Feb 15, 2024 at 07:30:32AM +0000, Reshetova, Elena wrote:
> Should we just go back to the approach to add one more check in random_init_early()
> to panic in the CoCo case if both rdseed and rdrand fails to give us anything?

Yea, no, definitely not. That is, in my opinion, completely backwards
and leads to impossible maintainability. CoCo is not some special
snowflake that gets to sprinkle random conditionals in generic code.

First, consider the motivation for doing this:
- This is to abort on a physical defective CPU bug -- presumably a
highly implausible thing to ever happen.
- This is for a threat model that few people are really compelled by
anyway, e.g. it's whack-a-mole season with host->guest vectors.
- This is for a single somewhat obscure configuration of a single
architecture with a feature only available on certain chipsets.
- This is not an "intrinsic" problem that necessitates plumbing complex
policy logic all over the place, but for a very special
driver-specific case.

Rather, what this patch does is...

> Now with this patch, the logic becomes

Your description actually wasn't quite accurate so I'll write it out
(and I'll clarify in the commit message/comments for v3 - my fault for
being vague):

1. At early boot, x86/CoCo is initialized. As part of that
initialization, it makes sure it can get 256 bits of RDRAND output
and adds it to the pool, in exactly the same way that the SD card
driver adds inserted memory card serial numbers to the pool. If it
can't get RDRAND output, it means CoCo loses one of its "Co"s, and so
it panic()s.

2. Later, the generic RNG initializes in random_init_early() and
random_init(), where it opportunistically tries to use everything it
can to get initialized -- architectural seed, architectural rand,
jitter, timers, boot seeds, *seeds passed from other drivers*, and
whatever else it can.

Now what you're complaining about is that in this step 2, we wind up
adding *even more* rdrand (though, more probably rdseed), in addition to
what was already added in the platform-specific driver in step 1. Boo
hoo? I can't see how that's a bad thing. Step 1 was CoCo's policy driver
*ensuring* that it was able to push at least *something good* into the
RNG, and taking a CoCo-specific policy decision (panic()ing) if it
can't. Step 2 is just generic RNG stuff doing its generic RNG thing.

You might also want to needle on the fact that if RDRAND is somehow
intermittently physically defective, and so step 1 succeeds, but in step
2, the RNG doesn't manage to get seeded by RDRAND and so initializes
based on jitter or IRQs or something. Okay, fine, but who cares? First,
you'd be talking here about a hugely unlikely defective hardware case,
and second, the end state remains basically identical: there's a good
seed from RDRAND and the RNG itself is able to initialize.

So I really don't want to litter the generic code with a bunch of
platform-specific hacks. The function add_device_randomness()
specifically exists so that individual platforms and devices that have
some unique insight into an entropy source or entropy requirements or
policy or whatever else can do that in their own platform or device
driver code where it belongs.

Jason