Re: [PATCH] x86: bring back rep movsq for user access on CPUs without ERMS

From: Linus Torvalds
Date: Tue Aug 29 2023 - 15:31:39 EST


On Mon, 28 Aug 2023 at 10:07, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>
> Hand-rolled mov loops executing in this case are quite pessimal compared
> to rep movsq for bigger sizes. While the upper limit depends on uarch,
> everyone is well south of 1KB AFAICS and sizes bigger than that are
> common. The problem can be easily remedied so do it.

Ok, looking at teh actual code now, and your patch is buggy.

> +.Llarge_movsq:
> + movq %rcx,%r8
> + movq %rcx,%rax
> + shrq $3,%rcx
> + andl $7,%eax
> +6: rep movsq
> + movl %eax,%ecx
> testl %ecx,%ecx
> jne .Lcopy_user_tail
> RET

The fixup code is very very broken:

> +/*
> + * Recovery after failed rep movsq
> + */
> +7: movq %r8,%rcx
> + jmp .Lcopy_user_tail
> +
> + _ASM_EXTABLE_UA( 6b, 7b)

That just copies the original value back into %rcx. That's not at all
ok. The "rep movsq" may have succeeded partially, and updated %rcx
(and %rsi/rdi) accordingly. You now will do the "tail" for entirely
too much, and returning the wrong return value.

In fact, if this then races with a mmap() in another thread, the user
copy might end up then succeeding for the part that used to fail, and
in that case it will possibly end up copying much more than asked for
and overrunning the buffers provided.

So all those games with %r8 are entirely bogus. There is no way that
"save the original length" can ever be relevant or correct.

Linus