Re: [PATCH] x86: bring back rep movsq for user access on CPUs without ERMS

From: Mateusz Guzik
Date: Tue Aug 29 2023 - 15:46:07 EST


On 8/29/23, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, 28 Aug 2023 at 10:07, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>>
>> Hand-rolled mov loops executing in this case are quite pessimal compared
>> to rep movsq for bigger sizes. While the upper limit depends on uarch,
>> everyone is well south of 1KB AFAICS and sizes bigger than that are
>> common. The problem can be easily remedied so do it.
>
> Ok, looking at teh actual code now, and your patch is buggy.
>
>> +.Llarge_movsq:
>> + movq %rcx,%r8
>> + movq %rcx,%rax
>> + shrq $3,%rcx
>> + andl $7,%eax
>> +6: rep movsq
>> + movl %eax,%ecx
>> testl %ecx,%ecx
>> jne .Lcopy_user_tail
>> RET
>
> The fixup code is very very broken:
>
>> +/*
>> + * Recovery after failed rep movsq
>> + */
>> +7: movq %r8,%rcx
>> + jmp .Lcopy_user_tail
>> +
>> + _ASM_EXTABLE_UA( 6b, 7b)
>
> That just copies the original value back into %rcx. That's not at all
> ok. The "rep movsq" may have succeeded partially, and updated %rcx
> (and %rsi/rdi) accordingly. You now will do the "tail" for entirely
> too much, and returning the wrong return value.
>
> In fact, if this then races with a mmap() in another thread, the user
> copy might end up then succeeding for the part that used to fail, and
> in that case it will possibly end up copying much more than asked for
> and overrunning the buffers provided.
>
> So all those games with %r8 are entirely bogus. There is no way that
> "save the original length" can ever be relevant or correct.
>

Huh, pretty obvious now that you mention it, I don't know why I
thought regs go back. But more importantly I should have checked
handling in the now-removed movsq routine (copy_user_generic_string):

[snip]
movl %edx,%ecx
shrl $3,%ecx
andl $7,%edx
1: rep movsq
2: movl %edx,%ecx
3: rep movsb
xorl %eax,%eax
ASM_CLAC
RET

11: leal (%rdx,%rcx,8),%ecx
12: movl %ecx,%edx /* ecx is zerorest also */
jmp .Lcopy_user_handle_tail

_ASM_EXTABLE_CPY(1b, 11b)
_ASM_EXTABLE_CPY(3b, 12b)
[/snip]

So I think I know how to fix it, but I'm going to sleep on it.

--
Mateusz Guzik <mjguzik gmail.com>