RE: [PATCH v2 1/2] x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline

From: Dexuan Cui
Date: Tue May 23 2023 - 14:30:38 EST


> From: Michael Kelley (LINUX) <mikelley@xxxxxxxxxxxxx>
> Sent: Tuesday, May 23, 2023 10:14 AM
> To: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
> <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
> <decui@xxxxxxxxxxxxx>; catalin.marinas@xxxxxxx; will@xxxxxxxxxx;
> tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; bp@xxxxxxxxx;
> dave.hansen@xxxxxxxxxxxxxxx; hpa@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-hyperv@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> x86@xxxxxxxxxx
> Cc: Michael Kelley (LINUX) <mikelley@xxxxxxxxxxxxx>
> Subject: [PATCH v2 1/2] x86/hyperv: Fix hyperv_pcpu_input_arg handling
> when CPUs go online/offline
>
> These commits
>
> a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with
> hyperv_pcpu_input_arg")
> 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
>
> update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
> because that memory will be correctly marked as decrypted or encrypted
> for all VM types (CoCo or normal). But problems ensue when CPUs in the
> VM go online or offline after virtual PCI devices have been configured.
>
> When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
> initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
> But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
> may call the virtual PCI driver and fault trying to use the as yet
> uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
> VM if the MMIO read and write hypercalls are used from state
> CPUHP_AP_IRQ_AFFINITY_ONLINE.
>
> When a CPU is taken offline, IRQs may be reassigned in state
> CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
> use the hyperv_pcpu_input_arg that has already been freed by a
> higher state.
>
> Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
> immediately after CPUHP_AP_ONLINE_IDLE (similar to
> CPUHP_AP_KVM_ONLINE)
> and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
> Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
> early enough.
>
> Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
> a CPU goes offline. Retain the allocated memory, and reuse it if
> the CPU comes back online later.
>
> Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> ---
>
> Changes in v2:
> * Put CPUHP_AP_HYPERV_ONLINE before CPUHP_AP_KVM_ONLINE [Vitaly
> Kuznetsov]

Reviewed-by: Dexuan Cui <decui@xxxxxxxxxxxxx>