Re: [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure

From: Sean Christopherson
Date: Wed Jun 07 2023 - 16:09:00 EST


On Wed, Jun 07, 2023, Isaku Yamahata wrote:
> On Wed, Jun 07, 2023 at 12:27:33PM -0700,
> Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> > On 6/7/23 11:53, Isaku Yamahata wrote:
> > >>> VMX enabling, and KVM is the only user of TDX. This implementation
> > >>> chooses to make KVM itself responsible for enabling VMX before using
> > >>> TDX and let the rest of the kernel stay blissfully unaware of VMX.
> > >>>
> > >>> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The
> > >>> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX
> > >>> first. Architecturally, there is no CPU flag to check whether the CPU
> > >>> is in VMX operation. Also, if a BIOS were buggy, it could still report
> > >>> valid TDX private KeyIDs when TDX actually couldn't be enabled.
> > >> I'm not sure this is a great justification. If the BIOS is lying to the
> > >> OS, we _should_ oops.
> > >>
> > >> How else can this happen other than silly kernel bugs. It's OK to oops
> > >> in the face of silly kernel bugs.
> > > TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via
> > > syscore.shutdown callback. However, guest TD can be still running to issue
> > > SEAMCALL resulting in #UD.
> > >
> > > Or we can postpone the change and make the TDX KVM patch series carry a patch
> > > for it.
> >
> > How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem?
>
> extable. From arch/x86/kvm/vmx/vmenter.S
>
> .Lvmresume:
> vmresume
> jmp .Lvmfail
>
> .Lvmlaunch:
> vmlaunch
> jmp .Lvmfail
>
> _ASM_EXTABLE(.Lvmresume, .Lfixup)
> _ASM_EXTABLE(.Lvmlaunch, .Lfixup)

More specifically, KVM eats faults on VMX and SVM instructions that occur after
KVM forcefully disables VMX/SVM.

E.g. with reboot -f, this will be reached without first stopping VMs:

static void kvm_shutdown(void)
{
/*
* Disable hardware virtualization and set kvm_rebooting to indicate
* that KVM has asynchronously disabled hardware virtualization, i.e.
* that relevant errors and exceptions aren't entirely unexpected.
* Some flavors of hardware virtualization need to be disabled before
* transferring control to firmware (to perform shutdown/reboot), e.g.
* on x86, virtualization can block INIT interrupts, which are used by
* firmware to pull APs back under firmware control. Note, this path
* is used for both shutdown and reboot scenarios, i.e. neither name is
* 100% comprehensive.
*/
pr_info("kvm: exiting hardware virtualization\n");
kvm_rebooting = true;
on_each_cpu(hardware_disable_nolock, NULL, 1);
}

which KVM x86 (VMX and SVM) then queries when deciding what to do with a spurious
fault on a VMX/SVM instruction

/*
* Handle a fault on a hardware virtualization (VMX or SVM) instruction.
*
* Hardware virtualization extension instructions may fault if a reboot turns
* off virtualization while processes are running. Usually after catching the
* fault we just panic; during reboot instead the instruction is ignored.
*/
noinstr void kvm_spurious_fault(void)
{
/* Fault while not rebooting. We want the trace. */
BUG_ON(!kvm_rebooting);
}
EXPORT_SYMBOL_GPL(kvm_spurious_fault);