Re: [RFC] Randomness on confidential computing platforms

From: Kirill A. Shutemov
Date: Mon Jan 29 2024 - 05:27:23 EST


On Fri, Jan 26, 2024 at 07:23:55AM -0800, Sean Christopherson wrote:
> On Fri, Jan 26, 2024, Kirill A. Shutemov wrote:
> > Problem Statement
> >
> > Currently Linux RNG uses the random inputs obtained from x86
> > RDRAND/RDSEED instructions (if present) during early initialization
> > stage (by mixing the obtained input into the random pool via
> > _mix_pool_bytes()), as well as for seeding/reseeding ChaCha-based CRNG.
> > When the calls to both RDRAND/RDSEED fail (including RDRAND internal
> > retries), the timing-based fallbacks are used in the latter case, and
> > during the early boot case this source of entropy input is simply
> > skipped. Overall Linux RNG has many other sources of entropy that it
> > uses (also depending on what HW is used), but the dominating one is
> > interrupts.
> >
> > In a Confidential Computing Guest threat model, given the absence of any
> > special trusted HW for the secure entropy input, RDRAND/RDSEED
> > instructions is the only entropy source that is unobservable outside of
> > Confidential Computing Guest TCB. However, with enough pressure on these
> > instructions from multiple cores (see Intel SDM, Volume 1, Section
> > 7.3.17, “Random Number Generator Instructions”), they can be made to
> > fail on purpose and force the Confidential Computing Guest Linux RNG to
> > use only Host/VMM controlled entropy sources.
> >
> > Solution options
> >
> > There are several possible solutions to this problem and the intention
> > of this RFC is to initiate a joined discussion. Here are some options
> > that has been considered:
> >
> > 1. Do nothing and accept the risk.
> > 2. Force endless looping on RDRAND/RDSEED instructions when run in a
> > Confidential Computing Guest (this patch). This option turns the
> > attack against the quality of cryptographic randomness provided by
> > Confidential Computing Guest’s Linux RNG into a DoS attack against
> > the Confidential Computing Guest itself (DoS attack is out of scope
> > for the Confidential Computing threat model).
> > 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
> > Another DoS variant against the Guest.
> > 4. Exit to the host/VMM with an error indication after a Confidential
> > Computing Guest failed to obtain random input from RDRAND/RDSEED
> > instructions after reasonable number of retries. This option allows
> > host/VMM to take some correction action for cases when the load on
> > RDRAND/RDSEED instructions has been put by another actor, i.e. the
> > other guest VM. The exit to host/VMM in such cases can be made
> > transparent for the Confidential Computing Guest in the TDX case with
> > the assistance of the TDX module component.
>
> Hell no. Develop better hardware if you want to guarantee forward progress.
> Don't push more complexity into the host stack for something that in all likelihood
> will never happen outside of buggy software or hardware.

My idea for this option was to make TDH.VP.ENTER return TDX_RND_NO_ENTROPY
in such case. VMM can simply retry or maybe schedule other workload and
let entropy pool to recover.

I don't think making RDRAND/RDSEED never-fail on HW level is feasible. And
it is definitely not guaranteed by current architecture.

> > 5. Anything other better option?
>
> Give the admin the option to choose between "I don't care, carry-on with less
> randomness" and "I'm paranoid, panic, panic, panic!". In other words, let the
> admin choose between #1 and #3 at boot time. You could probably even let the
> admin control the number of retries, though that's probably a bit excessive.
>
> And don't tie it to CoCo VMs, e.g. if someone is relying on randomness for a bare
> metal workload, they might prefer to panic if hardware is acting funky.

If we go this path, I still the option has to have strict default for
CoCo VMs as they don't have options to fallback to.

--
Kiryl Shutsemau / Kirill A. Shutemov