Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`

From: Willy Tarreau
Date: Wed Aug 30 2023 - 15:04:50 EST


On Wed, Aug 30, 2023 at 10:44:53PM +0700, Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 05:23:22PM +0200, Willy Tarreau wrote:
> > Then "xchg %esi, %eax" is just one byte with no memory access ;-)
>
> Perfect!
>
> Now I got this, shorter than "movl %esi, %eax":
> ```
> 0000000000001500 <memset>:
> 1500: 96 xchg %eax,%esi
> 1501: 48 89 d1 mov %rdx,%rcx
> 1504: 57 push %rdi
> 1505: f3 aa rep stos %al,%es:(%rdi)
> 1507: 58 pop %rax
> 1508: c3 ret
> ```
>
> Unfortunately, the xchg trick doesn't yield smaller machine code for
> %rdx, %rcx. Lol.

Normal, that's because historically "xchg ax, regX" was a single-byte 0x9X
on 8086, then it turned to 32-bit keeping the same encoding, like many
instructions (note that NOP is encoded as xchg ax,ax). It remains short
when you can sacrifice the other register, or restore it later using yet
another xchg. For rcx/rdx a push/pop could do it as they should also be
a single-byte 0x5X even in long mode unless I'm mistaken. Thus if you
absolutely want to squeeze that 9th byte to end up with a 8-byte function
you could probably do:

xchg %eax, %esi 1
push %rdx 1
pop %rcx 1
push %rdi 1
rep movsb 2
pop %rax 1
ret 1
------------- Total: 8 bytes :-)

Willy