Re: [PATCH] x86: bring back rep movsq for user access on CPUs without ERMS

From: Linus Torvalds
Date: Tue Aug 29 2023 - 16:04:58 EST


On Tue, 29 Aug 2023 at 12:45, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> So I think I know how to fix it, but I'm going to sleep on it.

I think you can just skip the %r8 games, and do that

leal (%rax,%rcx,8),%rcx

in the exception fixup code, since %rax will have the low bits of the
byte count, and %rcx will have the remaining qword count.

We should also have some test-case for partial reads somewhere, but I
have to admit that when I did the cleanup patches I just wrote some
silly test myself (ie just doing a 'mmap()' and then reading/writing
into the end of that mmap at different offsets.

I didn't save that hacky thing, I'm afraid.

I also tried to figure out if there is any CPU we should care about
that doesn't like 'rep movsq', but I think you are right that there
really isn't. The "good enough" rep things were introduced in the PPro
if I recall correctly, and while you could disable them in the BIOS,
by the time Intel did 64-bit in Northwood (?) it was pretty much
standard.

So yeah, no reason to have the unrolled loop at all, and I think your
patch is fine conceptually, just needs fixing and testing for the
partial success case.

Oh, and you should also remove the clobbers of r8-r11 in the
copy_user_generic() inline asm in <asm/uaccess_64.h> when you've fixed
the exception handling. The only reason for those clobbers were for
that unrolled register use.

So only %rax ends up being a clobber for the rep_movs_alternative
case, as far as I can tell.

Linus