Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles

From: Paolo Bonzini
Date: Fri Jun 09 2023 - 04:42:20 EST


On 6/7/23 19:33, Sean Christopherson wrote:
Don't most SMM handlers rendezvous all CPUs? I.e. won't blocking SMIs indefinitely
potentially cause problems too?

Not that I'm aware of. If so then this would be a hideous firmware bug
as firmware must be aware of CPUs which hang around in INIT independent
of this.

SMM does do the rendezvous of all CPUs, but also has a way to detect the
blocked ones, in WFS via some package scoped ubox register. So it knows to
skip those. I can find this in internal sources, but they aren't available
in the edk2 open reference code. They happen to be documented only in the
BWG, which isn't available freely.

Ah, so putting CPUs into WFS shouldn't result in odd delays. At least not on
bare metal. Hmm, and AFAIK the primary use case for SMM in VMs is for secure
boot, so taking SMIs after booting and putting CPUs back into WFS should be ok-ish.

VMs do not have things like periodic or watchdog SMIs, they only enter SMM in response to IPIs or writes to 0xB1.

The writes to 0xB1 in turn should only happen from UEFI runtime services related to the UEFI variable store. Another possibility could be ACPI bytecode from either DSDT or APEI; not implemented yet and very unlikely to happen in the future, but not impossible either.

Either way they should not happen before the kexec-ed kernel has brought up all CPUs.

Paolo

Finding a victim to test this in a QEMU VM w/ Secure Boot would be nice to have.