RE: [PATCH v8 33/33] KVM: x86/vmx: refactor VMX_DO_EVENT_IRQOFF to generate FRED stack frames

From: Li, Xin3
Date: Wed Apr 12 2023 - 14:26:27 EST



> And then this is equally gross. Rather than funnel FRED+legacy into a single
> function only to split them back out, just route FRED into its own asm subroutine.
> The common bits are basically the creation/destruction of the stack frame and
> the CALL itself, i.e. the truly interesting bits are what's different.

I try to catch up with you but am still confused.

Because a FRED stack frame always contains an error code pushed after RIP,
the FRED entry code doesn't push any error code.

Thus I introduced a trampoline code, which is called to have the return
instruction address pushed first. Then the trampoline code pushes an error
code (0 for both IRQ and NMI) and jumps to fred_entrypoint_kernel() for NMI
handling or calls external_interrupt() for IRQ handling.

The return RIP is used to return from fred_entrypoint_kernel(), but not
external_interrupt().


> Pretty much all of the #ifdeffery goes away, the helpers just need #ifdefs to play
> nice with CONFIG_X86_FRED=n. E.g. something like the below as a starting point
> (it most definitely doesn't compile, and most definitely isn't 100% correct).
>
> ---
> arch/x86/kvm/vmx/vmenter.S | 72
> ++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/vmx/vmx.c | 19 ++++++++--
> 2 files changed, 88 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index
> 631fd7da2bc3..a6929c78e038 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -2,12 +2,14 @@
> #include <linux/linkage.h>
> #include <asm/asm.h>
> #include <asm/bitsperlong.h>
> +#include <asm/fred.h>
> #include <asm/kvm_vcpu_regs.h>
> #include <asm/nospec-branch.h>
> #include <asm/percpu.h>
> #include <asm/segment.h>
> #include "kvm-asm-offsets.h"
> #include "run_flags.h"
> +#include "../../entry/calling.h"
>
> #define WORD_SIZE (BITS_PER_LONG / 8)
>
> @@ -31,6 +33,62 @@
> #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE
> #endif
>
> +#ifdef CONFIG_X86_FRED
> +.macro VMX_DO_FRED_EVENT_IRQOFF call_target cs_val
> + /*
> + * Unconditionally create a stack frame, getting the correct RSP on the
> + * stack (for x86-64) would take two instructions anyways, and RBP can
> + * be used to restore RSP to make objtool happy (see below).
> + */
> + push %_ASM_BP
> + mov %_ASM_SP, %_ASM_BP
> +
> + /*
> + * Don't check the FRED stack level, the call stack leading to this
> + * helper is effectively constant and shallow (relatively speaking).
> + *
> + * Emulate the FRED-defined redzone and stack alignment (128 bytes and
> + * 64 bytes respectively).
> + */
> + sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp
> + and $FRED_STACK_FRAME_RSP_MASK, %rsp
> +
> + /*
> + * A FRED stack frame has extra 16 bytes of information pushed at the
> + * regular stack top compared to an IDT stack frame.
> + */
> + push $0 /* Reserved by FRED, must be 0 */
> + push $0 /* FRED event data, 0 for NMI and external interrupts */
> + shl $32, %rax
> + orq $__KERNEL_DS | $FRED_64_BIT_MODE, %ax
> + push %rax /* Vector (from the "caller") and DS */
> +
> + push %rbp
> + pushf
> + push \cs_val

We need to push the RIP of the next instruction here. Or are you suggesting
we don't need to care about it because it may not be used to return from the
callee?

As mentioned above, the return RIP is used when returning from NMI handling.

Or I totally missed a key idea to build a FRED stack frame?

Thanks!
Xin

> + push $0 /* FRED error code, 0 for NMI and external interrupts */
> + PUSH_REGS
> +
> + /* Load @pt_regs */
> + movq %rsp, %_ASM_ARG1
> +
> + call \call_target
> +
> + POP_REGS
> +
> + /*
> + * "Restore" RSP from RBP, even though IRET has already unwound RSP
> to
> + * the correct value. objtool doesn't know the callee will IRET and,
> + * without the explicit restore, thinks the stack is getting walloped.
> + * Using an unwind hint is problematic due to x86-64's dynamic
> alignment.
> + */
> + mov %_ASM_BP, %_ASM_SP
> + pop %_ASM_BP
> + RET
> +.endm
> +#endif
> +