Re: [PATCH v7 006/102] KVM: TDX: Detect CPU feature on kernel module initialization

From: Kai Huang
Date: Mon Jun 27 2022 - 23:43:11 EST


On Mon, 2022-06-27 at 14:52 -0700, isaku.yamahata@xxxxxxxxx wrote:
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> TDX requires several initialization steps for KVM to create guest TDs.
> Detect CPU feature, enable VMX (TDX is based on VMX), detect TDX module
> availability, and initialize TDX module. This patch implements the first
> step to detect CPU feature. Because VMX isn't enabled yet by VMXON
> instruction on KVM kernel module initialization, defer further
> initialization step until VMX is enabled by hardware_enable callback.

Not clear why you need to split into multiple patches. If we put all
initialization into one patch, it's much easier to see why those steps are done
in whatever way.

>
> Introduce a module parameter, enable_tdx, to explicitly enable TDX KVM
> support. It's off by default to keep same behavior for those who don't use
> TDX. Implement CPU feature detection at KVM kernel module initialization
> as hardware_setup callback to check if CPU feature is available and get
> some CPU parameters.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> ---
> arch/x86/kvm/Makefile | 1 +
> arch/x86/kvm/vmx/main.c | 18 ++++++++++++++++-
> arch/x86/kvm/vmx/tdx.c | 40 ++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/x86_ops.h | 6 ++++++
> 4 files changed, 64 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/kvm/vmx/tdx.c
>
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index ee4d0999f20f..e2c05195cb95 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -24,6 +24,7 @@ kvm-$(CONFIG_KVM_XEN) += xen.o
> kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
> vmx/evmcs.o vmx/nested.o vmx/posted_intr.o vmx/main.o
> kvm-intel-$(CONFIG_X86_SGX_KVM) += vmx/sgx.o
> +kvm-intel-$(CONFIG_INTEL_TDX_HOST) += vmx/tdx.o
>
> kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
>
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 636768f5b985..fabf5f22c94f 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -6,6 +6,22 @@
> #include "nested.h"
> #include "pmu.h"
>
> +static bool __read_mostly enable_tdx = IS_ENABLED(CONFIG_INTEL_TDX_HOST);
> +module_param_named(tdx, enable_tdx, bool, 0444);
> +
> +static __init int vt_hardware_setup(void)
> +{
> + int ret;
> +
> + ret = vmx_hardware_setup();
> + if (ret)
> + return ret;
> +
> + enable_tdx = enable_tdx && !tdx_hardware_setup(&vt_x86_ops);
> +
> + return 0;
> +}
> +
> struct kvm_x86_ops vt_x86_ops __initdata = {
> .name = "kvm_intel",
>
> @@ -147,7 +163,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> struct kvm_x86_init_ops vt_init_ops __initdata = {
> .cpu_has_kvm_support = vmx_cpu_has_kvm_support,
> .disabled_by_bios = vmx_disabled_by_bios,
> - .hardware_setup = vmx_hardware_setup,
> + .hardware_setup = vt_hardware_setup,
> .handle_intel_pt_intr = NULL,
>
> .runtime_ops = &vt_x86_ops,
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> new file mode 100644
> index 000000000000..c12e61cdddea
> --- /dev/null
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/cpu.h>
> +
> +#include <asm/tdx.h>
> +
> +#include "capabilities.h"
> +#include "x86_ops.h"
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "tdx: " fmt
> +
> +static u64 hkid_mask __ro_after_init;
> +static u8 hkid_start_pos __ro_after_init;
> +
> +int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops)
> +{
> + u32 max_pa;
> +
> + if (!enable_ept) {
> + pr_warn("Cannot enable TDX with EPT disabled\n");
> + return -EINVAL;
> + }
> +
> + if (!platform_tdx_enabled()) {
> + pr_warn("Cannot enable TDX on TDX disabled platform\n");
> + return -ENODEV;
> + }
> +
> + /* Safe guard check because TDX overrides tlb_remote_flush callback. */
> + if (WARN_ON_ONCE(x86_ops->tlb_remote_flush))
> + return -EIO;

To me it's better to move this chunk to the patch which actually implements how
to flush TLB foro private pages. W/o some background, it's hard to tell why TDX
needs to overrides tlb_remote_flush callback. Otherwise it's quite hard to
review here.

For instance, even if it must be replaced, I am wondering why it must be empty
at the beginning? For instance, assuming it has an original version which does
something:

x86_ops->tlb_remote_flush = vmx_remote_flush;

Why cannot it be replaced with vt_tlb_remote_flush():

int vt_tlb_remote_flush(struct kvm *kvm)
{
if (is_td(kvm))
return tdx_tlb_remote_flush(kvm);

return vmx_remote_flush(kvm);
}

?

> +
> + max_pa = cpuid_eax(0x80000008) & 0xff;
> + hkid_start_pos = boot_cpu_data.x86_phys_bits;
> + hkid_mask = GENMASK_ULL(max_pa - 1, hkid_start_pos);
> + pr_info("kvm: TDX is supported. hkid start pos %d mask 0x%llx\n",
> + hkid_start_pos, hkid_mask);

Again, I think it's better to introduce those in the patch where you actually
need those. It will be more clear if you introduce those with the code which
actually uses them.

For instance, I think both hkid_start_pos and hkid_mask are not necessary. If
you want to apply one keyid to an address, isn't below enough?

u64 phys |= ((keyid) << boot_cpu_data.x86_phys_bits);

> +
> + return 0;
> +}
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 0f8a8547958f..0a5967a91e26 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -122,4 +122,10 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu);
> #endif
> void vmx_setup_mce(struct kvm_vcpu *vcpu);
>
> +#ifdef CONFIG_INTEL_TDX_HOST
> +int __init tdx_hardware_setup(struct kvm_x86_ops *x86_ops);
> +#else
> +static inline int tdx_hardware_setup(struct kvm_x86_ops *x86_ops) { return 0; }
> +#endif

I think if you introduce a "tdx_ops.h", or "tdx_x86_ops.h", and you can only
include it when CONFIG_INTEL_TDX_HOST is true, then in tdx_ops.h you don't need
those stubs.

Makes sense?

> +
> #endif /* __KVM_X86_VMX_X86_OPS_H */