Re: [PATCH v12 07/22] x86/virt/tdx: Add skeleton to enable TDX on demand

From: Sathyanarayanan Kuppuswamy
Date: Mon Jun 26 2023 - 17:22:08 EST




On 6/26/23 7:12 AM, Kai Huang wrote:
> To enable TDX the kernel needs to initialize TDX from two perspectives:
> 1) Do a set of SEAMCALLs to initialize the TDX module to make it ready
> to create and run TDX guests; 2) Do the per-cpu initialization SEAMCALL
> on one logical cpu before the kernel wants to make any other SEAMCALLs
> on that cpu (including those involved during module initialization and
> running TDX guests).
>
> The TDX module can be initialized only once in its lifetime. Instead
> of always initializing it at boot time, this implementation chooses an
> "on demand" approach to initialize TDX until there is a real need (e.g
> when requested by KVM). This approach has below pros:
>
> 1) It avoids consuming the memory that must be allocated by kernel and
> given to the TDX module as metadata (~1/256th of the TDX-usable memory),
> and also saves the CPU cycles of initializing the TDX module (and the
> metadata) when TDX is not used at all.
>
> 2) The TDX module design allows it to be updated while the system is
> running. The update procedure shares quite a few steps with this "on
> demand" initialization mechanism. The hope is that much of "on demand"
> mechanism can be shared with a future "update" mechanism. A boot-time
> TDX module implementation would not be able to share much code with the
> update mechanism.
>
> 3) Making SEAMCALL requires VMX to be enabled. Currently, only the KVM
> code mucks with VMX enabling. If the TDX module were to be initialized
> separately from KVM (like at boot), the boot code would need to be
> taught how to muck with VMX enabling and KVM would need to be taught how
> to cope with that. Making KVM itself responsible for TDX initialization
> lets the rest of the kernel stay blissfully unaware of VMX.
>
> Similar to module initialization, also make the per-cpu initialization
> "on demand" as it also depends on VMX being enabled.
>
> Add two functions, tdx_enable() and tdx_cpu_enable(), to enable the TDX
> module and enable TDX on local cpu respectively. For now tdx_enable()
> is a placeholder. The TODO list will be pared down as functionality is
> added.
>
> Export both tdx_cpu_enable() and tdx_enable() for KVM use.
>
> In tdx_enable() use a state machine protected by mutex to make sure the
> initialization will only be done once, as tdx_enable() can be called
> multiple times (i.e. KVM module can be reloaded) and may be called
> concurrently by other kernel components in the future.
>
> The per-cpu initialization on each cpu can only be done once during the
> module's life time. Use a per-cpu variable to track its status to make
> sure it is only done once in tdx_cpu_enable().
>
> Also, a SEAMCALL to do TDX module global initialization must be done
> once on any logical cpu before any per-cpu initialization SEAMCALL. Do
> it inside tdx_cpu_enable() too (if hasn't been done).
>
> tdx_enable() can potentially invoke SEAMCALLs on any online cpus. The
> per-cpu initialization must be done before those SEAMCALLs are invoked
> on some cpu. To keep things simple, in tdx_cpu_enable(), always do the
> per-cpu initialization regardless of whether the TDX module has been
> initialized or not. And in tdx_enable(), don't call tdx_cpu_enable()
> but assume the caller has disabled CPU hotplug, done VMXON and
> tdx_cpu_enable() on all online cpus before calling tdx_enable().
>
> Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
> ---
>
> v11 -> v12:
> - Simplified TDX module global init and lp init status tracking (David).
> - Added comment around try_init_module_global() for using
> raw_spin_lock() (Dave).
> - Added one sentence to changelog to explain why to expose tdx_enable()
> and tdx_cpu_enable() (Dave).
> - Simplifed comments around tdx_enable() and tdx_cpu_enable() to use
> lockdep_assert_*() instead. (Dave)
> - Removed redundent "TDX" in error message (Dave).
>
> v10 -> v11:
> - Return -NODEV instead of -EINVAL when CONFIG_INTEL_TDX_HOST is off.
> - Return the actual error code for tdx_enable() instead of -EINVAL.
> - Added Isaku's Reviewed-by.
>
> v9 -> v10:
> - Merged the patch to handle per-cpu initialization to this patch to
> tell the story better.
> - Changed how to handle the per-cpu initialization to only provide a
> tdx_cpu_enable() function to let the user of TDX to do it when the
> user wants to run TDX code on a certain cpu.
> - Changed tdx_enable() to not call cpus_read_lock() explicitly, but
> call lockdep_assert_cpus_held() to assume the caller has done that.
> - Improved comments around tdx_enable() and tdx_cpu_enable().
> - Improved changelog to tell the story better accordingly.
>
> v8 -> v9:
> - Removed detailed TODO list in the changelog (Dave).
> - Added back steps to do module global initialization and per-cpu
> initialization in the TODO list comment.
> - Moved the 'enum tdx_module_status_t' from tdx.c to local tdx.h
>
> v7 -> v8:
> - Refined changelog (Dave).
> - Removed "all BIOS-enabled cpus" related code (Peter/Thomas/Dave).
> - Add a "TODO list" comment in init_tdx_module() to list all steps of
> initializing the TDX Module to tell the story (Dave).
> - Made tdx_enable() unverisally return -EINVAL, and removed nonsense
> comments (Dave).
> - Simplified __tdx_enable() to only handle success or failure.
> - TDX_MODULE_SHUTDOWN -> TDX_MODULE_ERROR
> - Removed TDX_MODULE_NONE (not loaded) as it is not necessary.
> - Improved comments (Dave).
> - Pointed out 'tdx_module_status' is software thing (Dave).
>
> v6 -> v7:
> - No change.
>
> v5 -> v6:
> - Added code to set status to TDX_MODULE_NONE if TDX module is not
> loaded (Chao)
> - Added Chao's Reviewed-by.
> - Improved comments around cpus_read_lock().
>
> - v3->v5 (no feedback on v4):
> - Removed the check that SEAMRR and TDX KeyID have been detected on
> all present cpus.
> - Removed tdx_detect().
> - Added num_online_cpus() to MADT-enabled CPUs check within the CPU
> hotplug lock and return early with error message.
> - Improved dmesg printing for TDX module detection and initialization.
>
>
> ---
> arch/x86/include/asm/tdx.h | 4 +
> arch/x86/virt/vmx/tdx/tdx.c | 162 ++++++++++++++++++++++++++++++++++++
> arch/x86/virt/vmx/tdx/tdx.h | 13 +++
> 3 files changed, 179 insertions(+)
>
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 4dfe2e794411..d8226a50c58c 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -97,8 +97,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>
> #ifdef CONFIG_INTEL_TDX_HOST
> bool platform_tdx_enabled(void);
> +int tdx_cpu_enable(void);
> +int tdx_enable(void);
> #else /* !CONFIG_INTEL_TDX_HOST */
> static inline bool platform_tdx_enabled(void) { return false; }
> +static inline int tdx_cpu_enable(void) { return -ENODEV; }
> +static inline int tdx_enable(void) { return -ENODEV; }
> #endif /* CONFIG_INTEL_TDX_HOST */
>
> #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 141d12376c4d..29ca18f66d61 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -13,6 +13,10 @@
> #include <linux/errno.h>
> #include <linux/printk.h>
> #include <linux/smp.h>
> +#include <linux/cpu.h>
> +#include <linux/spinlock.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/mutex.h>
> #include <asm/msr-index.h>
> #include <asm/msr.h>
> #include <asm/archrandom.h>
> @@ -23,6 +27,13 @@ static u32 tdx_global_keyid __ro_after_init;
> static u32 tdx_guest_keyid_start __ro_after_init;
> static u32 tdx_nr_guest_keyids __ro_after_init;
>
> +static bool tdx_global_initialized;
> +static DEFINE_RAW_SPINLOCK(tdx_global_init_lock);

Why use raw_spin_lock()?

> +static DEFINE_PER_CPU(bool, tdx_lp_initialized);
> +
> +static enum tdx_module_status_t tdx_module_status;
> +static DEFINE_MUTEX(tdx_module_lock);

I think you can add a single line comment about what states above
variables tracks. But it is entirely up to you.

> +
> /*
> * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
> * to kernel error code. @seamcall_ret and @out contain the SEAMCALL
> @@ -74,6 +85,157 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> }
> }
>
> +/*
> + * Do the module global initialization if not done yet.
> + * It's always called with interrupts and preemption disabled.
> + */
> +static int try_init_module_global(void)
> +{
> + unsigned long flags;
> + int ret;
> +
> + /*
> + * The TDX module global initialization only needs to be done
> + * once on any cpu.
> + */
> + raw_spin_lock_irqsave(&tdx_global_init_lock, flags);
> +
> + if (tdx_global_initialized) {
> + ret = 0;
> + goto out;
> + }
> +
> + /* All '0's are just unused parameters. */

I have noticed that you add the above comment whenever you call seamcall() with
0 as parameters. Is this a ask from the maintainer? If not, I think you can skip
it. Just explaining the parameters in seamcall function definition is good
enough.

> + ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL);
> + if (!ret)
> + tdx_global_initialized = true;
> +out:
> + raw_spin_unlock_irqrestore(&tdx_global_init_lock, flags);
> +
> + return ret;
> +}
> +
> +/**
> + * tdx_cpu_enable - Enable TDX on local cpu
> + *
> + * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module
> + * global initialization SEAMCALL if not done) on local cpu to make this
> + * cpu be ready to run any other SEAMCALLs.
> + *
> + * Call this function with preemption disabled.
> + *
> + * Return 0 on success, otherwise errors.
> + */
> +int tdx_cpu_enable(void)
> +{
> + int ret;
> +
> + if (!platform_tdx_enabled())
> + return -ENODEV;
> +
> + lockdep_assert_preemption_disabled();
> +
> + /* Already done */
> + if (__this_cpu_read(tdx_lp_initialized))
> + return 0;
> +
> + /*
> + * The TDX module global initialization is the very first step
> + * to enable TDX. Need to do it first (if hasn't been done)
> + * before the per-cpu initialization.
> + */
> + ret = try_init_module_global();
> + if (ret)
> + return ret;
> +
> + /* All '0's are just unused parameters */
> + ret = seamcall(TDH_SYS_LP_INIT, 0, 0, 0, 0, NULL, NULL);
> + if (ret)
> + return ret;
> +
> + __this_cpu_write(tdx_lp_initialized, true);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(tdx_cpu_enable);
> +
> +static int init_tdx_module(void)
> +{
> + /*
> + * TODO:
> + *
> + * - Get TDX module information and TDX-capable memory regions.
> + * - Build the list of TDX-usable memory regions.
> + * - Construct a list of "TD Memory Regions" (TDMRs) to cover
> + * all TDX-usable memory regions.
> + * - Configure the TDMRs and the global KeyID to the TDX module.
> + * - Configure the global KeyID on all packages.
> + * - Initialize all TDMRs.
> + *
> + * Return error before all steps are done.
> + */
> + return -EINVAL;
> +}
> +
> +static int __tdx_enable(void)
> +{
> + int ret;
> +
> + ret = init_tdx_module();
> + if (ret) {
> + pr_err("module initialization failed (%d)\n", ret);
> + tdx_module_status = TDX_MODULE_ERROR;
> + return ret;
> + }
> +
> + pr_info("module initialized.\n");
> + tdx_module_status = TDX_MODULE_INITIALIZED;
> +
> + return 0;
> +}
> +
> +/**
> + * tdx_enable - Enable TDX module to make it ready to run TDX guests
> + *
> + * This function assumes the caller has: 1) held read lock of CPU hotplug
> + * lock to prevent any new cpu from becoming online; 2) done both VMXON
> + * and tdx_cpu_enable() on all online cpus.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return 0 if TDX is enabled successfully, otherwise error.
> + */
> +int tdx_enable(void)
> +{
> + int ret;
> +
> + if (!platform_tdx_enabled())
> + return -ENODEV;
> +
> + lockdep_assert_cpus_held();
> +
> + mutex_lock(&tdx_module_lock);
> +
> + switch (tdx_module_status) {
> + case TDX_MODULE_UNKNOWN:
> + ret = __tdx_enable();
> + break;
> + case TDX_MODULE_INITIALIZED:
> + /* Already initialized, great, tell the caller. */
> + ret = 0;
> + break;
> + default:
> + /* Failed to initialize in the previous attempts */
> + ret = -EINVAL;
> + break;
> + }
> +
> + mutex_unlock(&tdx_module_lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_enable);
> +
> static int __init record_keyid_partitioning(u32 *tdx_keyid_start,
> u32 *nr_tdx_keyids)
> {
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 55dbb1b8c971..9fb46033c852 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -16,11 +16,24 @@
> */
> #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL
>
> +/*
> + * TDX module SEAMCALL leaf functions
> + */
> +#define TDH_SYS_INIT 33
> +#define TDH_SYS_LP_INIT 35
> +
> /*
> * Do not put any hardware-defined TDX structure representations below
> * this comment!
> */
>
> +/* Kernel defined TDX module status during module initialization. */
> +enum tdx_module_status_t {
> + TDX_MODULE_UNKNOWN,
> + TDX_MODULE_INITIALIZED,
> + TDX_MODULE_ERROR
> +};
> +
> struct tdx_module_output;
> u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> struct tdx_module_output *out);

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer