RE: [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM

From: Michael Kelley (LINUX)
Date: Mon Aug 21 2023 - 10:29:42 EST

Next message: Mirsad Todorovac: "Re: [PATCH v4 4.19 1/1] test_firmware: prevent race conditions by a correct implementation of locking"
Previous message: Steven Rostedt: "Re: [RFC PATCH v1 1/1] tracing/kprobe: Add multi-probe support for 'perf_kprobe' PMU"
In reply to: Dexuan Cui: "[PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM"
Next in thread: Dexuan Cui: "RE: [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dexuan Cui <decui@xxxxxxxxxxxxx> Sent: Sunday, August 20, 2023 1:27 PM
>
> Don't set *this_cpu_ptr(hyperv_pcpu_input_arg) before the function
> set_memory_decrypted() returns, otherwise we run into this ticky issue:
>
> For a fully enlightened TDX/SNP VM, in hv_common_cpu_init(),
> *this_cpu_ptr(hyperv_pcpu_input_arg) is an encrypted page before
> the set_memory_decrypted() returns.
>
> When such a VM has more than 64 VPs, if the hyperv_pcpu_input_arg is not
> NULL, hv_common_cpu_init() -> set_memory_decrypted() -> ... ->
> cpa_flush() -> on_each_cpu() -> ... -> hv_send_ipi_mask() -> ... ->
> __send_ipi_mask_ex() tries to call hv_do_fast_hypercall16() with the

Actually, it tries to call hv_do_rep_hypercall(). Using the memory-based
hyperv_pcpu_input_arg with a "fast" hypercall doesn't make sense.

> hyperv_pcpu_input_arg as the hypercall input page, which must be a
> decrypted page in such a VM, but the page is still encrypted at this
> point, and a fatal fault is triggered.
>
> Fix the issue by setting *this_cpu_ptr(hyperv_pcpu_input_arg) after
> set_memory_decrypted(): if the hyperv_pcpu_input_arg is NULL,
> __send_ipi_mask_ex() returns HV_STATUS_INVALID_PARAMETER immediately,
> and hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
> which can use x2apic_send_IPI_all(), which may be slightly slower than
> the hypercall but still works correctly in such a VM.
>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> ---
>
> Changes in v2: None
>
> drivers/hv/hv_common.c | 30 +++++++++++++++++++++++-------
> 1 file changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 897bbb96f4118..4c858e1636da7 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -360,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu)
> u64 msr_vp_index;
> gfp_t flags;
> int pgcount = hv_root_partition ? 2 : 1;
> + void *mem;
> int ret;
>
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> @@ -372,25 +373,40 @@ int hv_common_cpu_init(unsigned int cpu)
> * allocated if this CPU was previously online and then taken offline
> */
> if (!*inputarg) {
> - *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> - if (!(*inputarg))
> + mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> + if (!mem)
> return -ENOMEM;
>
> if (hv_root_partition) {
> outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> - *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> + *outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> }
>
> if (hv_isolation_type_en_snp() || hv_isolation_type_tdx()) {
> - ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
> + ret = set_memory_decrypted((unsigned long)mem, pgcount);
> if (ret) {
> - /* It may be unsafe to free *inputarg */
> - *inputarg = NULL;
> + /* It may be unsafe to free 'mem' */
> return ret;
> }
>
> - memset(*inputarg, 0x00, pgcount * PAGE_SIZE);
> + memset(mem, 0x00, pgcount * HV_HYP_PAGE_SIZE);
> }
> +
> + /*
> + * In a fully enlightened TDX/SNP VM with more than 64 VPs, if
> + * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() ->
> + * ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to
> + * use hyperv_pcpu_input_arg as the hypercall input page, which
> + * must be a decrypted page in such a VM, but the page is still
> + * encrypted before set_memory_decrypted() returns. Fix this by
> + * setting *inputarg after the above set_memory_decrypted(): if
> + * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns
> + * HV_STATUS_INVALID_PARAMETER immediately, and the function
> + * hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
> + * which may be slightly slower than the hypercall, but still
> + * works correctly in such a VM.
> + */
> + *inputarg = mem;
> }
>
> msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX);
> --
> 2.25.1

Modulo the minor correction in the commit message,

Reviewed-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>

Next message: Mirsad Todorovac: "Re: [PATCH v4 4.19 1/1] test_firmware: prevent race conditions by a correct implementation of locking"
Previous message: Steven Rostedt: "Re: [RFC PATCH v1 1/1] tracing/kprobe: Add multi-probe support for 'perf_kprobe' PMU"
In reply to: Dexuan Cui: "[PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM"
Next in thread: Dexuan Cui: "RE: [PATCH v2 5/9] Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]