Re: [RFC PATCH 00/35] SEV-ES hypervisor support

From: Sean Christopherson
Date: Mon Nov 30 2020 - 13:15:34 EST


On Mon, Nov 30, 2020, Paolo Bonzini wrote:
> On 16/09/20 02:19, Sean Christopherson wrote:
> >
> > TDX also selectively blocks/skips portions of other ioctl()s so that the
> > TDX code itself can yell loudly if e.g. .get_cpl() is invoked. The event
> > injection restrictions are due to direct injection not being allowed (except
> > for NMIs); all IRQs have to be routed through APICv (posted interrupts) and
> > exception injection is completely disallowed.
> >
> > kvm_vcpu_ioctl_x86_get_vcpu_events:
> > if (!vcpu->kvm->arch.guest_state_protected)
> > events->interrupt.shadow = kvm_x86_ops.get_interrupt_shadow(vcpu);
>
> Perhaps an alternative implementation can enter the vCPU with immediate exit
> until no events are pending, and then return all zeroes?

This can't work. If the guest has STI blocking, e.g. it did STI->TDVMCALL with
a valid vIRQ in GUEST_RVI, then events->interrupt.shadow should technically be
non-zero to reflect the STI blocking. But, the immediate exit (a hardware IRQ
for TDX guests) will cause VM-Exit before the guest can execute any instructions
and thus the guest will never clear STI blocking and never consume the pending
event. Or there could be a valid vIRQ, but GUEST_RFLAGS.IF=0, in which case KVM
would need to run the guest for an indeterminate amount of time to wait for the
vIRQ to be consumed.

Tangentially related, I haven't looked through the official external TDX docs,
but I suspect that vmcs.GUEST_RVI is listed as inaccessible for production TDs.
This will be changed as the VMM needs access to GUEST_RVI to handle
STI->TDVMCALL(HLT), otherwise the VMM may incorrectly put the vCPU into a
blocked (not runnable) state even though it has a pending wake event.