Re: [PATCHv2.1 05/29] x86/tdx: Add HLT support for TDX guests

From: Kirill A. Shutemov
Date: Wed Feb 09 2022 - 09:34:03 EST


On Mon, Feb 07, 2022 at 10:52:19PM +0000, Sean Christopherson wrote:
> On Fri, Feb 04, 2022, Kirill A. Shutemov wrote:
> > > > Looks dubious to me, I donno. I worry about possible conflicts with the
> > > > spec in the future.
> > >
> > > The spec should have a reserved space for such things :)
>
> Heh, the problem is someone has to deal with munging the two things together.
> E.g. if there's a EXIT_REASON_SAFE_HLT then the hypervisor would need a handler
> that's identical to EXIT_REASON_HLT, except with guest.EFLAGS.IF forced to '1'.
> The guest gets the short end of the stick because EXIT_REASON_HLT is already an
> established VM-Exit reason.
>
> > > But you might think about having a in/out struct similar to the module
> > > call or just an array of u64.
> > >
> > > and the signature would become:
> > >
> > > __tdx_hypercall(u64 op, u64 flags, struct inout *args)
> > > __tdx_hypercall(u64 op, u64 flags, u64 *args)
> > >
> > > and have flag bits:
> > >
> > > HCALL_ISSUE_STI
> > > HCALL_HAS_OUTPUT
> > >
> > > Hmm?
> >
> > We have two distinct cases: standard hypercalls (defined in GHCI) and KVM
> > hypercalls. In the first case R10 is 0 (indicating standard TDVMCALL) and
> > R11 defines the operation. For KVM hypercalls R10 encodes the operation
> > (KVM hypercalls indexed from 1) and R11 is the first argument. So we
> > cannot get away with simple "RDI is op" interface.
> >
> > And we need to return two values: RAX indicates if TDCALL itself was
> > successful and R10 is result of the hypercall. So we cannot easily get
> > away without output struct. HCALL_HAS_OUTPUT is not needed.
>
> But __tdx_hypercall() should never fail TDCALL. The TDX spec even says:
>
> RAX TDCALL instruction return code. Always returns Intel TDX_SUCCESS (0).
>
> IIRC, the original PoC went straight to a ud2 if tdcall failed. Why not do
> something similar? That would get rid of the bajillion instances of:
>
> if (__tdx_hypercall(...))
> panic("Hypercall fn %llu failed (Buggy TDX module!)\n", fn);

Okay, below is my take on it. Boot tested.

I used UD2 as a way to deal with it-never-heppens condition. Not sure if
it the right way, but stack trace looks useful and I see it being used for
CONFIG_DEBUG_ENTRY.

Any comments?

/*
* __tdx_hypercall() - Make hypercalls to a TDX VMM.
*
* Transforms values in function call argument struct tdx_hypercall_args @args
* into the TDCALL register ABI. After TDCALL operation, VMM output is saved
* back in @args.
*
*-------------------------------------------------------------------------
* TD VMCALL ABI:
*-------------------------------------------------------------------------
*
* Input Registers:
*
* RAX - TDCALL instruction leaf number (0 - TDG.VP.VMCALL)
* RCX - BITMAP which controls which part of TD Guest GPR
* is passed as-is to the VMM and back.
* R10 - Set 0 to indicate TDCALL follows standard TDX ABI
* specification. Non zero value indicates vendor
* specific ABI.
* R11 - VMCALL sub function number
* RBX, RBP, RDI, RSI - Used to pass VMCALL sub function specific arguments.
* R8-R9, R12-R15 - Same as above.
*
* Output Registers:
*
* RAX - TDCALL instruction status (Not related to hypercall
* output).
* R10 - Hypercall output error code.
* R11-R15 - Hypercall sub function specific output values.
*
*-------------------------------------------------------------------------
*
* __tdx_hypercall() function ABI:
*
* @args (RDI) - struct tdx_hypercall_args for input and output
* @flags (RSI) - TDX_HCALL_* flags
*
* On successful completion, return the hypercall error code.
*/
SYM_FUNC_START(__tdx_hypercall)
FRAME_BEGIN

/* Save callee-saved GPRs as mandated by the x86_64 ABI */
push %r15
push %r14
push %r13
push %r12

/* Mangle function call ABI into TDCALL ABI: */
/* Set TDCALL leaf ID (TDVMCALL (0)) in RAX */
xor %eax, %eax

/* Copy hypercall registers from arg struct: */
movq TDX_HYPERCALL_r10(%rdi), %r10
movq TDX_HYPERCALL_r11(%rdi), %r11
movq TDX_HYPERCALL_r12(%rdi), %r12
movq TDX_HYPERCALL_r13(%rdi), %r13
movq TDX_HYPERCALL_r14(%rdi), %r14
movq TDX_HYPERCALL_r15(%rdi), %r15

movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx

/*
* For the idle loop STI needs to be called directly before
* the TDCALL that enters idle (EXIT_REASON_HLT case). STI
* instruction enables interrupts only one instruction later.
* If there is a window between STI and the instruction that
* emulates the HALT state, there is a chance for interrupts to
* happen in this window, which can delay the HLT operation
* indefinitely. Since this is the not the desired result,
* conditionally call STI before TDCALL.
*/
testq $TDX_HCALL_ISSUE_STI, %rsi
jz .Lskip_sti
sti
.Lskip_sti:
tdcall

/*
* TDVMCALL leaf does not suppose to fail. If it fails something
* is horribly wrong with TDX module. Stop the world.
*/
test %rax, %rax
je .Lsuccess
ud2
.Lsuccess:
/* TDVMCALL leaf return code is in R10 */
movq %r10, %rax

/* Copy hypercall result registers to arg struct if needed */
testq $TDX_HCALL_HAS_OUTPUT, %rsi
jz .Lout

movq %r10, TDX_HYPERCALL_r10(%rdi)
movq %r11, TDX_HYPERCALL_r11(%rdi)
movq %r12, TDX_HYPERCALL_r12(%rdi)
movq %r13, TDX_HYPERCALL_r13(%rdi)
movq %r14, TDX_HYPERCALL_r14(%rdi)
movq %r15, TDX_HYPERCALL_r15(%rdi)
.Lout:
/*
* Zero out registers exposed to the VMM to avoid
* speculative execution with VMM-controlled values.
* This needs to include all registers present in
* TDVMCALL_EXPOSE_REGS_MASK (except R12-R15).
* R12-R15 context will be restored.
*/
xor %r10d, %r10d
xor %r11d, %r11d

/* Restore callee-saved GPRs as mandated by the x86_64 ABI */
pop %r12
pop %r13
pop %r14
pop %r15

FRAME_END

retq
SYM_FUNC_END(__tdx_hypercall)
--
Kirill A. Shutemov