Re: [PATCH v2 1/1] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs

From: Wei Liu
Date: Tue May 23 2023 - 14:54:12 EST


On Thu, May 18, 2023 at 08:13:52AM -0700, Michael Kelley wrote:
> vmbus_wait_for_unload() may be called in the panic path after other
> CPUs are stopped. vmbus_wait_for_unload() currently loops through
> online CPUs looking for the UNLOAD response message. But the values of
> CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
> to stop the other CPUs, and in one of the paths the stopped CPUs
> are removed from cpu_online_mask. This removal happens in both
> x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
> only checks the panic'ing CPU, and misses the UNLOAD response message
> except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
> eventually times out, but only after waiting 100 seconds.
>
> Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
> The cpu_present_mask is not modified by stopping the other CPUs in the
> panic path, nor should it be.
>
> Also, in a CoCo VM the synic_message_page is not allocated in
> hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs()
> and hv_synic_disable_regs() such that it is set only when the CPU is
> online. If not all present CPUs are online when vmbus_wait_for_unload()
> is called, the synic_message_page might be NULL. Add a check for this.
>
> Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: John Starks <jostarks@xxxxxxxxxxxxx>
> Signed-off-by: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

Applied to hyperv-fixes. Thanks.