Re: [PATCH 2/6] x86/entry/64: Convert SYSRET validation tests to C

From: Mika Penttilä
Date: Tue Jul 18 2023 - 10:50:42 EST




On 18.7.2023 17.25, Brian Gerst wrote:
On Tue, Jul 18, 2023 at 10:17 AM Mika Penttilä <mpenttil@xxxxxxxxxx> wrote:

Hi,


On 18.7.2023 16.44, Brian Gerst wrote:
Signed-off-by: Brian Gerst <brgerst@xxxxxxxxx>
---
arch/x86/entry/common.c | 50 ++++++++++++++++++++++++++++++-
arch/x86/entry/entry_64.S | 55 ++--------------------------------
arch/x86/include/asm/syscall.h | 2 +-
3 files changed, 52 insertions(+), 55 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b33..afe79c3f1c5b 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -70,8 +70,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
return false;
}

-__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
+/* Returns true to return using SYSRET, or false to use IRET */
+__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
{
+ long rip;
+ unsigned int shift_rip;
+
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);

@@ -84,6 +88,50 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)

instrumentation_end();
syscall_exit_to_user_mode(regs);
+
+ /*
+ * Check that the register state is valid for using SYSRET to exit
+ * to userspace. Otherwise use the slower but fully capable IRET
+ * exit path.
+ */
+
+ /* XEN PV guests always use IRET path */
+ if (cpu_feature_enabled(X86_FEATURE_XENPV))
+ return false;
+
+ /* SYSRET requires RCX == RIP and R11 == EFLAGS */
+ if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
+ return false;
+
+ /* CS and SS must match the values set in MSR_STAR */
+ if (unlikely(regs->cs != __USER_CS || regs->ss != __USER_DS))
+ return false;
+
+ /*
+ * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
+ * in kernel space. This essentially lets the user take over
+ * the kernel, since userspace controls RSP.
+ *
+ * Change top bits to match most significant bit (47th or 56th bit
+ * depending on paging mode) in the address.
+ */
+ shift_rip = (64 - __VIRTUAL_MASK_SHIFT + 1);

Should this be:

shift_rip = (64 - __VIRTUAL_MASK_SHIFT - 1);
?

I removed a set of parentheses, which switched the sign from -1 to +1.
I could put it back if that's less confusing.


I mean isn't it supposed to be:
shift_rip = (64 - 48) for 4 level, now it's
shift_rip = (64 - 46)

__VIRTUAL_MASK_SHIFT == 47


Brian Gerst