Re: [PATCH] x86: only use ERMS for user copies for larger sizes

From: Andy Lutomirski
Date: Fri Nov 23 2018 - 13:39:38 EST

Next message: Linus Torvalds: "Re: [PATCH] x86: only use ERMS for user copies for larger sizes"
Previous message: Rich Felker: "Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation"
In reply to: Linus Torvalds: "Re: [PATCH] x86: only use ERMS for user copies for larger sizes"
Next in thread: Linus Torvalds: "Re: [PATCH] x86: only use ERMS for user copies for larger sizes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Nov 23, 2018, at 10:42 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Let me write a generic routine in lib/iomap_copy.c (which already does
>> the "user specifies chunk size" cases), and hook it up for x86.
>
> Something like this?
>
> ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
> compile, it might not work.
>
> And this doesn't actually do the memset_io() function at all, just the
> memcpy ones.
>
> Finally, it's worth noting that on x86, we have this:
>
> /*
> * override generic version in lib/iomap_copy.c
> */
> ENTRY(__iowrite32_copy)
> movl %edx,%ecx
> rep movsd
> ret
> ENDPROC(__iowrite32_copy)
>
> because back in 2006, we did this:
>
> [PATCH] Add faster __iowrite32_copy routine for x86_64
>
> This assembly version is measurably faster than the generic version in
> lib/iomap_copy.c.
>
> which actually implies that "rep movsd" is faster than doing
> __raw_writel() by hand.
>
> So it is possible that this should all be arch-specific code rather
> than that butt-ugly "generic" code I wrote in this patch.
>
> End result: I'm not really all that happy about this patch, but it's
> perhaps worth testing, and it's definitely worth discussing. Because
> our current memcpy_{to,from}io() is truly broken garbage.
>
>

What is memcpy_to_io even supposed to do? Iâm guessing itâs defined as something like âcopy this data to IO space using at most long-sized writes, all aligned, and writing each byte exactly once, in order.â That sounds... dubiously useful. I could see a function that writes to aligned memory in specified-sized chunks. And I can see a use for a function to just write it in whatever size chunks the architecture thinks is fastest, and *that* should probably use MOVDIR64B.

Or is there some subtlety Iâm missing?

Next message: Linus Torvalds: "Re: [PATCH] x86: only use ERMS for user copies for larger sizes"
Previous message: Rich Felker: "Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation"
In reply to: Linus Torvalds: "Re: [PATCH] x86: only use ERMS for user copies for larger sizes"
Next in thread: Linus Torvalds: "Re: [PATCH] x86: only use ERMS for user copies for larger sizes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]