Re: [PATCH] x86: only use ERMS for user copies for larger sizes

From: Andy Lutomirski
Date: Thu Nov 22 2018 - 13:07:15 EST


On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 22, 2018 at 9:36 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
> >
> > The other problem with the ERMS copy is that it gets used
> > for copy_to/from_io() - and the 'rep movsb' on uncached
> > locations has to do byte copies.
>
> Ugh. I thought we changed that *long* ago, because even our non-ERMS
> copy is broken for PCI (it does overlapping stores for the small tail
> cases).
>
> But looking at "memcpy_{from,to}io()", I don't see x86 overriding it
> with anything better.
>
> I suspect nobody uses those functions for anything critical any more.
> The fbcon people have their own copy functions, iirc.
>
> But we definitely should fix this. *NONE* of the regular memcpy
> functions actually work right for PCI space any more, and haven't for
> a long time.

I'm not personally volunteering, but I suspect we can do much better
than we do now:

- The new MOVDIRI and MOVDIR64B instructions can do big writes to WC
and UC memory. I assume those would be safe to use in ...toio()
functions, unless there are quirky devices out there that blow up if
their MMIO space is written in 64-byte chunks.

- MOVNTDQA can, I think, do 64-byte loads, but only from WC memory.
For sufficiently large copies, it could plausibly be faster to create
a WC alias and use MOVNTDQA than it is to copy in 8- for 16-byte
chunks. The i915 driver has a copy implementation using MOVNTDQA --
maybe this should get promoted to something in arch/x86 called
memcpy_from_wc().

--Andy