Re: [PATCH v2 0/6] x86: Clean up fast syscall return validation

From: H. Peter Anvin
Date: Fri Oct 06 2023 - 15:00:27 EST


On 10/5/23 13:20, Ingo Molnar wrote:

* Brian Gerst <brgerst@xxxxxxxxx> wrote:

Looking at the compiled output, the only suboptimal code appears to be
the canonical address test, where the C code uses the CL register for
the shifts instead of immediates.

180: e9 00 00 00 00 jmp 185 <do_syscall_64+0x85>
181: R_X86_64_PC32 .altinstr_aux-0x4
185: b9 07 00 00 00 mov $0x7,%ecx
18a: eb 05 jmp 191 <do_syscall_64+0x91>
18c: b9 10 00 00 00 mov $0x10,%ecx
191: 48 89 c2 mov %rax,%rdx
194: 48 d3 e2 shl %cl,%rdx
197: 48 d3 fa sar %cl,%rdx
19a: 48 39 d0 cmp %rdx,%rax
19d: 75 39 jne 1d8 <do_syscall_64+0xd8>

Yeah, it didn't look equivalent - so I guess we want a C equivalent for:

- ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
- "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57

instead of the pgtable_l5_enabled() runtime test that
__is_canonical_address() uses?


I don't think such a thing (without simply duplicate the above as an alternative asm, which is obviously easy enough, and still allows the compiler to pick the register used) would be possible without immediate patching support[*].

Incidentally, this is a question for Uros: is there a reason this is a mov to %ecx and not just %cl, which would save 3 bytes?

Incidentally, it is possible to save one instruction and use only *one* alternative immediate:

leaq (%rax,%rax),%rdx
xorq %rax,%rdx
shrq $(63 - LA),%rdx # Yes, 63, not 64
# ZF=1 if canonical

This works because if bit [x] is set in the output, then bit [x] and [x-1] in the input are different (bit [-1] considered to be zero); and by definition a bit is canonical if and only if all the bits [63:LA] are identical, thus bits [63:LA+1] in the output must all be zero.

The first two instructions are pure arithmetic and can thus be done in C:

bar = foo ^ (foo << 1);

... leaving only one instruction needing to be patched at runtime.

-hpa



[*] which is a whole bit of infrastructure that I know first-hand is extremely complex[**] -- I had an almost-complete patchset done at one time, but it has severely bitrotted. The good part is that it is probably a lot easier to do today, with the alternatives-patching callback routines available.

[**] the absolute best would of course be if such infrastructure could be compiler-supported (probably as part as some really fancy support for alternatives/static branch), but that would probably be a nightmare to do in the compiler or a plugin.