Re: [PATCH v3 0/3] x86/crash: Fix double NMI shootdown bug

From: Sean Christopherson
Date: Tue Nov 15 2022 - 14:36:33 EST


On Tue, Nov 15, 2022, Guilherme G. Piccoli wrote:
> On 14/11/2022 20:34, Sean Christopherson wrote:
> > [...]
> > v3:
> > - Re-collect Guilherme's Tested-by.
> > - Tweak comment in patch 1 to reference STGI instead of CLGI.
> > - Celebrate this series' half-birthday.
>
> Heheh
>
> Thanks a lot for persisting with this Sean, much appreciated! I'm
> surprised on how long is taking to get these _fixes_ merged in the
> kernel, hence your effort is very valuable =)

Well, to be fair, the fixes aren't perfect. Aside from the GIF thing, patch 2
breaks CONFIG_SMP=n.

I think there's another bug lurking too. The emergency reboot path doesn't
VMCLEAR VMCSes. AFAIK, Intel doesn't guarantee the VMCS caches are purged on
INIT, so if the reboot doesn't actually RESET CPUs, the new kernel could observe
memory corruption due to an old VMCS getting written back.

Argh, and I missed sysvec_reboot() + smp_stop_nmi_callback() for SVM support.

And slightly longer term, this entire mess can be cleaned up. Once KVM's handling
of VMX/SVM initialization sucks less[*], all of the disabling logic can be moved
into KVM callbacks and the kernel can stop speculatively trying to disable VMX/SVM.

I'll send a v4 to fix all of the suspected bugs, and then work on another series to
clean up the callbacks, which will have dependencies on both the kvm_init() rework
and this series.

[*] https://lore.kernel.org/all/20221102231911.3107438-1-seanjc@xxxxxxxxxx