RE: [PATCH v2 2/2] x86/coco: Require seeding RNG with RDRAND on CoCo systems

From: Reshetova, Elena
Date: Thu Feb 15 2024 - 02:30:50 EST


>
> There are few uses of CoCo that don't rely on working cryptography and
> hence a working RNG. Unfortunately, the CoCo threat model means that the
> VM host cannot be trusted and may actively work against guests to
> extract secrets or manipulate computation. Since a malicious host can
> modify or observe nearly all inputs to guests, the only remaining source
> of entropy for CoCo guests is RDRAND.
>
> If RDRAND is broken -- due to CPU hardware fault -- the generic path
> will WARN(), but probably CoCo systems shouldn't even continue
> executing. This is mostly a concern at boot time when initially seeding
> the RNG, as after that the consequences of a broken RDRAND are much more
> theoretical.
>
> So, try at boot to seed the RNG using 256 bits of RDRAND output. If this
> fails, panic(). This will also trigger if the system is booted without
> RDRAND, as RDRAND is essential for a safe CoCo boot.
>
> This patch is deliberately written to be "just a CoCo x86 driver
> feature" and not part of the RNG itself. Many device drivers and
> platforms have some desire to contribute something to the RNG, and
> add_device_randomness() is specifically meant for this purpose. Any
> driver can call this with seed data of any quality, or even garbage
> quality, and it can only possibly make the quality of the RNG better or
> have no effect, but can never make it worse. Rather than trying to
> build something into the core of the RNG, this patch interprets the
> particular CoCo issue as just a CoCo issue, and therefore separates this
> all out into driver (well, arch/platform) code.
>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Daniel P. Berrangé <berrange@xxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: Elena Reshetova <elena.reshetova@xxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Theodore Ts'o <tytso@xxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> ---
> Changes v1->v2:
> - panic() instead of BUG_ON(), as suggested by Andi Kleen.
> - Update comments, now that we have info from AMD and Intel.
>
> arch/x86/coco/core.c | 36 ++++++++++++++++++++++++++++++++++++
> arch/x86/include/asm/coco.h | 2 ++
> arch/x86/kernel/setup.c | 2 ++
> 3 files changed, 40 insertions(+)
>
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index eeec9986570e..34d7c6795e8c 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -3,13 +3,16 @@
> * Confidential Computing Platform Capability checks
> *
> * Copyright (C) 2021 Advanced Micro Devices, Inc.
> + * Copyright (C) 2024 Jason A. Donenfeld <Jason@xxxxxxxxx>. All Rights Reserved.
> *
> * Author: Tom Lendacky <thomas.lendacky@xxxxxxx>
> */
>
> #include <linux/export.h>
> #include <linux/cc_platform.h>
> +#include <linux/random.h>
>
> +#include <asm/archrandom.h>
> #include <asm/coco.h>
> #include <asm/processor.h>
>
> @@ -153,3 +156,36 @@ __init void cc_set_mask(u64 mask)
> {
> cc_mask = mask;
> }
> +
> +__init void cc_random_init(void)
> +{
> + unsigned long rng_seed[32 / sizeof(long)];
> + size_t i, longs;
> +
> + if (cc_vendor == CC_VENDOR_NONE)
> + return;
> +
> + /*
> + * Since the CoCo threat model includes the host, the only reliable
> + * source of entropy that can be neither observed nor manipulated is
> + * RDRAND. Usually, RDRAND failure is considered tolerable, but since a
> + * host can possibly induce failures consistently, it's important to at
> + * least ensure the RNG gets some initial random seeds.
> + */
> + for (i = 0; i < ARRAY_SIZE(rng_seed); i += longs) {
> + longs = arch_get_random_longs(&rng_seed[i],
> ARRAY_SIZE(rng_seed) - i);
> +
> + /*
> + * A zero return value means that the guest doesn't have RDRAND
> + * or the CPU is physically broken, and in both cases that
> + * means most crypto inside of the CoCo instance will be
> + * broken, defeating the purpose of CoCo in the first place. So
> + * just panic here because it's absolutely unsafe to continue
> + * executing.
> + */
> + if (longs == 0)
> + panic("RDRAND is defective.");
> + }
> + add_device_randomness(rng_seed, sizeof(rng_seed));


While this patch functionally does close to what we need, i.e. adds 256 bits to the pool
from rdrand and panics otherwise, I personally find it quite confusing in the overall
architecture of Linux RNG. You have done a lot of great work to clean the arch
back in 2022, and currently we have two straightforward paths where the entropy is
added from rdrand/rdseed into the pool and seed production with proper entropy
accounting. Yes, I think everyone would agree (and it has been pointed in numerous
papers/reports) that entropy accounting is somewhat doomed business, but this
is what you have to do at least initially in order to still maintain the overall
logic of why Linux RNG does what it does.

Now with this patch, the logic becomes somewhat messy:

1. in early boot Linux RNG will attempt to get inputs from rdrand/rdseed anyhow
and add it to the pool and count entropy (if successful and if we trust cpu rng)
2. now somewhat later, we *again* try to mix in 256 bits from *rdrand only*
(ideally in non-attack cases we should use rdseed here if it gives us output) via
the interface that is by Linux RNG design intended for consuming entropy from
*untrusted* sources (hence no entropy counting), and if we fail this action
(which architecturally should not matter for Linux RNG based on the used interface)
we are going to panic the machine, because in fact this is the most important
for CoCo.

I find the logic above confusing and not very clean.
Should we just go back to the approach to add one more check in random_init_early()
to panic in the CoCo case if both rdseed and rdrand fails to give us anything?
This way one can still look at the Linux RNG code overall and understand what
is going to be the behavior in all conditions/cases?

Best Regards,
Elena.