Re: [PATCH v19 078/130] KVM: TDX: Implement TDX vcpu enter/exit path

From: Isaku Yamahata
Date: Fri Mar 15 2024 - 16:43:05 EST


On Fri, Mar 15, 2024 at 10:26:30AM -0700,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:

> > diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> > index d822e790e3e5..81d301fbe638 100644
> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -27,6 +27,37 @@ struct kvm_tdx {
> > struct page *source_page;
> > };
> >
> > +union tdx_exit_reason {
> > + struct {
> > + /* 31:0 mirror the VMX Exit Reason format */
>
> Then use "union vmx_exit_reason", having to maintain duplicate copies of the same
> union is not something I want to do.
>
> I'm honestly not even convinced that "union tdx_exit_reason" needs to exist. I
> added vmx_exit_reason because we kept having bugs where KVM would fail to strip
> bits 31:16, and because nested VMX needs to stuff failed_vmentry, but I don't
> see a similar need for TDX.
>
> I would even go so far as to say the vcpu_tdx field shouldn't be exit_reason,
> and instead should be "return_code" or something. E.g. if the TDX module refuses
> to run the vCPU, there's no VM-Enter and thus no VM-Exit (unless you count the
> SEAMCALL itself, har har). Ditto for #GP or #UD on the SEAMCALL (or any other
> reason that generates TDX_SW_ERROR).
>
> Ugh, I'm doubling down on that suggesting. This:
>
> WARN_ON_ONCE(!kvm_rebooting &&
> (tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR);
>
> if ((u16)tdx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> is_nmi(tdexit_intr_info(vcpu))) {
> kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> vmx_do_nmi_irqoff();
> kvm_after_interrupt(vcpu);
> }
>
> is heinous. If there's an error that leaves bits 15:0 zero, KVM will synthesize
> a spurious NMI. I don't know whether or not that can happen, but it's not
> something that should even be possible in KVM, i.e. the exit reason should be
> processed if and only if KVM *knows* there was a sane VM-Exit from non-root mode.
>
> tdx_vcpu_run() has a similar issue, though it's probably benign. If there's an
> error in bits 15:0 that happens to collide with EXIT_REASON_TDCALL, weird things
> will happen.
>
> if (tdx->exit_reason.basic == EXIT_REASON_TDCALL)
> tdx->tdvmcall.rcx = vcpu->arch.regs[VCPU_REGS_RCX];
> else
> tdx->tdvmcall.rcx = 0;
>
> I vote for something like the below, with much more robust checking of vp_enter_ret
> before it's converted to a VMX exit reason.
>
> static __always_inline union vmx_exit_reason tdexit_exit_reason(struct kvm_vcpu *vcpu)
> {
> return (u32)vcpu->vp_enter_ret;
> }

Thank you for the concrete suggestion. Let me explore what safe guard check
can be done to make exit path robust.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>