Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on module initialization

From: Huang, Kai
Date: Thu Aug 11 2022 - 07:16:13 EST


First of all, I think the patch title can be improved. "refactor CPU
compatibility check on module initialization" isn't the purpose of this patch.
It is just a bonus. The title should reflect the main purpose (or behaviour) of
this patch:

KVM: Temporarily enable hardware on all cpus during module loading time

On Sun, 2022-08-07 at 15:00 -0700, isaku.yamahata@xxxxxxxxx wrote:
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
>
> TDX module requires its initialization. It requires VMX to be enabled.
> Although there are several options of when to initialize it, the choice is
> the initialization time of the KVM kernel module. There is no usable
> arch-specific hook for the TDX module to utilize during the KVM kernel module
> initialization. The code doesn't enable/disable hardware (VMX in TDX case)
> during the kernel module initialization. Add a hook for enabling hardware,
> arch-specific initialization, and disabling hardware during KVM kernel
> module initialization to make a room for TDX module initialization.  
>

The hook is only for arch-specific initialization. It doesn't do
hardware_enable_all() and hardware_disable_all().

> The
> current KVM enables hardware when the first VM is created and disables
> hardware when the last VM is destroyed. When no VM is running, hardware is
> disabled. To follow these semantics, the kernel module initialization needs
> to disable hardware. Opportunistically refactor the code to enable/disable
> hardware.
>
> Add hadware_enable_all() and hardware_disable_all() to kvm_init() and
> introduce a new arch-specific callback function,
> kvm_arch_post_hardware_enable_setup, for arch to do arch-specific
> initialization that requires hardware_enable_all(). Opportunistically,
> move kvm_arch_check_processor_compat() to to hardware_enabled_nolock().
> TDX module initialization code will go into
> kvm_arch_post_hardware_enable_setup().

The remaining paragraphs are very hard to follow. IMHO they are too details and
are not changelog material. If you can obtain Reviewed-by from
reviewer/maintainer for specific arch, then it implies the code is fine to that
arch, and you don't need those details below in changelog.

So perhaps just say something like:

With doing hardware_enable_all() and hardware_disable_all() in kvm_init(), some
arch-specific code which must be done before them must also be moved before
kvm_init(). Only VMX and MIPS have such code:
- For VMX, per-cpu variable loaded_vmcss_on_cpu must be initialized before
hardware_disable_all(). Introduce vmx_early_init() to do it and call
vmx_eraly_init() before kvm_init() in vmx_init().
- For MIPS, move kvm_init() to the end of kvm_mips_init() (or whatever you
like...).


>
> This patch reorders some function calls as below from (*) (**) (A) and (B)
> to (A) (B) and (*). Here (A) and (B) depends on (*), but not (**). By
> code inspection, only mips and VMX has the code of (*). No other
> arch has empty (*). So refactor mips and VMX and eliminate the
> necessity hook for (*) instead of adding an unused hook.
>
> Before this patch:
> - Arch module initialization
> - kvm_init()
> - kvm_arch_init()
> - kvm_arch_check_processor_compat() on each CPUs
> - post-arch-specific initialization -- (*): (A) and (B) depends on this
> - post-arch-specific initialization -- (**): no dependency to (A) and (B)
>
> - When creating/deleting the first/last VM
> - kvm_arch_hardware_enable() on each CPUs -- (A)
> - kvm_arch_hardware_disable() on each CPUs -- (B)
>
> After this patch:
> - Arch module initialization
> - kvm_init()
> - kvm_arch_init()
> - arch-specific initialization -- (*)
> - kvm_arch_check_processor_compat() on each CPUs
> - kvm_arch_hardware_enable() on each CPUs -- (A)
> - kvm_arch_hardware_disable() on each CPUs -- (B)
> - post-arch-specific initialization -- (**)
>
> - When creating/deleting the first/last VM (no logic change)
> - kvm_arch_hardware_enable() on each CPUs -- (A)
> - kvm_arch_hardware_disable() on each CPUs -- (B)
>
> Code inspection result:
> As long as I inspected, I found only mips and VMX have non-empty (*) or
> non-empty (A) or (B).
> x86: tested on a real machine
> mips: compile test only
> powerpc, s390, arm, riscv: code inspection only
>
> - arch/mips/kvm/mips.c
> module init function, kvm_mips_init(), does some initialization after
> kvm_init(). Compile test only.
>
> - arch/x86/kvm/x86.c
> - uses vm_list which is statically initialized.
> - static_call(kvm_x86_hardware_enable)();
> - SVM: (*) and (**) are empty.
> - VMX: initialize percpu variable loaded_vmcss_on_cpu that VMXON uses.
>
> - arch/powerpc/kvm/powerpc.c
> kvm_arch_hardware_enable/disable() are nop
>
> - arch/s390/kvm/kvm-s390.c
> kvm_arch_hardware_enable/disable() are nop
>
> - arch/arm64/kvm/arm.c
> module init function, arm_init(), calls only kvm_init().
> (*) and (**) are empty
>
> - arch/riscv/kvm/main.c
> module init function, riscv_kvm_init(), calls only kvm_init().
> (*) and (**) are empty
>
> Co-developed-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> ---
> arch/mips/kvm/mips.c | 16 +++++++++++-----
> arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++----
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 31 ++++++++++++++++++++++++-------
> 4 files changed, 55 insertions(+), 16 deletions(-)
>
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 092d09fb6a7e..fd7339cff57c 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1642,12 +1642,11 @@ static int __init kvm_mips_init(void)
> return -EOPNOTSUPP;
> }
>
> + /*
> + * kvm_init() calls kvm_arch_hardware_enable/disable(). The early
> + * initialization is needed before calling kvm_init().
> + */

Sorry I take this back. It's upto MIPS reviewer/maintainer to decide whether we
need a comment here.

> ret = kvm_mips_entry_setup();
> - if (ret)
> - return ret;
> -
> - ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> -
> if (ret)
> return ret;
>
> @@ -1656,6 +1655,13 @@ static int __init kvm_mips_init(void)
>
> register_die_notifier(&kvm_mips_csr_die_notifier);
>
> + ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> +
> + if (ret) {
> + unregister_die_notifier(&kvm_mips_csr_die_notifier);
> + return ret;
> + }
> +
> return 0;
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c5637a4c1c43..4d7e1073984d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8433,6 +8433,23 @@ static void vmx_exit(void)
> }
> module_exit(vmx_exit);
>
> +/*
> + * Early initialization before kvm_init() so that vmx_hardware_enable/disable()
> + * can work.
> + */
> +static void __init vmx_init_early(void)
> +{
> + int cpu;
> +
> + /*
> + * vmx_hardware_disable() accesses loaded_vmcss_on_cpu list.
> + * Initialize the variable before kvm_init() that calls
> + * vmx_hardware_enable/disable().
> + */
> + for_each_possible_cpu(cpu)
> + INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> +}
> +
> static int __init vmx_init(void)
> {
> int r, cpu;
> @@ -8470,6 +8487,7 @@ static int __init vmx_init(void)
> }
> #endif
>
> + vmx_init_early();
> r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
> __alignof__(struct vcpu_vmx), THIS_MODULE);
> if (r)
> @@ -8490,11 +8508,8 @@ static int __init vmx_init(void)
>
> vmx_setup_fb_clear_ctrl();
>
> - for_each_possible_cpu(cpu) {
> - INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> -
> + for_each_possible_cpu(cpu)
> pi_init_cpu(cpu);
> - }
>
> #ifdef CONFIG_KEXEC_CORE
> rcu_assign_pointer(crash_vmclear_loaded_vmcss,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9643b8eadefe..63d4e3ce3e53 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
> int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
> int kvm_arch_hardware_setup(void *opaque);
> +int kvm_arch_post_hardware_enable_setup(void *opaque);
> void kvm_arch_hardware_unsetup(void);
> int kvm_arch_check_processor_compat(void);
> int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0f5767e5ae45..a43fdbfaede2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4995,8 +4995,13 @@ static void hardware_enable_nolock(void *junk)
>
> cpumask_set_cpu(cpu, cpus_hardware_enabled);
>
> + r = kvm_arch_check_processor_compat();
> + if (r)
> + goto out;
> +
> r = kvm_arch_hardware_enable();
>
> +out:
> if (r) {
> cpumask_clear_cpu(cpu, cpus_hardware_enabled);
> atomic_inc(&hardware_enable_failed);
> @@ -5793,9 +5798,9 @@ void kvm_unregister_perf_callbacks(void)
> }
> #endif
>
> -static void check_processor_compat(void *rtn)
> +__weak int kvm_arch_post_hardware_enable_setup(void *opaque)
> {
> - *(int *)rtn = kvm_arch_check_processor_compat();
> + return 0;
> }
>
> int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> @@ -5828,11 +5833,23 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> if (r < 0)
> goto out_free_1;
>
> - for_each_online_cpu(cpu) {
> - smp_call_function_single(cpu, check_processor_compat, &r, 1);
> - if (r < 0)
> - goto out_free_2;
> - }
> + /* hardware_enable_nolock() checks CPU compatibility on each CPUs. */
> + r = hardware_enable_all();
> + if (r)
> + goto out_free_2;
> + /*
> + * Arch specific initialization that requires to enable virtualization
> + * feature. e.g. TDX module initialization requires VMXON on all
> + * present CPUs.
> + */
> + kvm_arch_post_hardware_enable_setup(opaque);
> + /*
> + * Make hardware disabled after the KVM module initialization. KVM
> + * enables hardware when the first KVM VM is created and disables
> + * hardware when the last KVM VM is destroyed. When no KVM VM is
> + * running, hardware is disabled. Keep that semantics.
> + */

Except the first sentence, the remaining sentences are more like changelog
material. Perhaps just say something below to be more specific on the purpose:

/*
* Disable hardware on all cpus so that out-of-tree drivers which
* also use hardware-assisted virtualization (such as virtualbox
* kernel module) can still be loaded when KVM is loaded.
*/

> + hardware_disable_all();
>
> r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
> kvm_starting_cpu, kvm_dying_cpu);


Anyway (although I still think it's better to introduce the __weak
kvm_arch_post_hardware_enable_setup() in later patch together with the VMX
version to support TDX), with above comments addressed:

Acked-by: Kai Huang <kai.huang@xxxxxxxxx>

--
Thanks,
-Kai