Re: [PATCH v1 3/3] Documentation: x86-64: Document registers on entry and exit

From: Andy Lutomirski
Date: Fri Jan 07 2022 - 19:02:37 EST


On 1/7/22 15:52, Ammar Faizi wrote:
There was a controversial discussion about the wording in the System
V ABI document regarding what registers the kernel is allowed to
clobber when the userspace executes syscall.

The resolution of the discussion was reviewing the clobber list in
the glibc source. For a historical reason in the glibc source, the
kernel must restore all registers before returning to the userspace
(except for rax, rcx and r11).

On Wed, 13 Oct 2021 at 16:24:28 +0000, Michael Matz <matz@xxxxxxx> wrote:
It might also be interesting to know that while the wording of the psABI
was indeed intended to imply that all argument registers are potentially
clobbered (like with normal calls) glibc's inline assembler to call
syscalls relies on most registers to actually be preserved:

# define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
...
#define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6) \
({ \
unsigned long int resultvar; \
TYPEFY (arg6, __arg6) = ARGIFY (arg6); \
TYPEFY (arg5, __arg5) = ARGIFY (arg5); \
TYPEFY (arg4, __arg4) = ARGIFY (arg4); \
TYPEFY (arg3, __arg3) = ARGIFY (arg3); \
TYPEFY (arg2, __arg2) = ARGIFY (arg2); \
TYPEFY (arg1, __arg1) = ARGIFY (arg1); \
register TYPEFY (arg6, _a6) asm ("r9") = __arg6; \
register TYPEFY (arg5, _a5) asm ("r8") = __arg5; \
register TYPEFY (arg4, _a4) asm ("r10") = __arg4; \
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3; \
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2; \
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1; \
asm volatile ( \
"syscall\n\t" \
: "=a" (resultvar) \
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), \
"r" (_a5), "r" (_a6) \
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL); \
(long int) resultvar; \
})


Note in particular the missing clobbers or outputs of any of the argument
regs.

So, even though the psABI (might have) meant something else, as glibc is
doing the above we in fact have a de-facto standard that the kernel can't
clobber any of the argument regs. The wording and the linux x86-64
syscall implementation (and use in glibc) all come from the same time in
2001, so there never was a time when the kernel was not saving/restoring
the arg registers, so it can't stop now.

In effect this means the psABI should be clarified to explicitely say the
the arg registers aren't clobbered, i.e. that the mentioned list of
clobbered regs isn't inclusive but exclusive. I will do that.

When I was discussing this with Boris earlier I hadn't yet looked at glibc
use but only gave my interpretation from memory and reading. Obviously
reality trumps anything like that :-)

Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@xxxxxxxxxxxxx/
Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25

This documents "registers on entry" and "registers on exit".

Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Michael Matz <matz@xxxxxxx>
Cc: "H.J. Lu" <hjl.tools@xxxxxxxxx>
Cc: Jonathan Corbet <corbet@xxxxxxx>
Cc: Willy Tarreau <w@xxxxxx>
Cc: x86-ml <x86@xxxxxxxxxx>
Cc: lkml <linux-kernel@xxxxxxxxxxxxxxx>
Cc: GNU/Weeb Mailing List <gwml@xxxxxxxxxxx>
Signed-off-by: Ammar Faizi <ammarfaizi2@xxxxxxxxxxx>
---
Documentation/x86/entry_64.rst | 47 ++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)

diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
index e433e08f7018..3f2007e2a938 100644
--- a/Documentation/x86/entry_64.rst
+++ b/Documentation/x86/entry_64.rst
@@ -108,3 +108,50 @@ We try to only use IST entries and the paranoid entry code for vectors
that absolutely need the more expensive check for the GS base - and we
generate all 'normal' entry points with the regular (faster) paranoid=0
variant.
+
+
+Registers on entry:
+-------------------

This is SYSCALL64 registers on entry, not general registers on entry. Also, this has little to do with the entry logic, so it probably doesn't belong in this file.