Re: Question Regarding ERMS memcpy

From: Linus Torvalds
Date: Sun Mar 05 2017 - 15:19:47 EST


On Sun, Mar 5, 2017 at 11:54 AM, Borislav Petkov <bp@xxxxxxx> wrote:
>>
>> We seem to have broken this *really* long ago, though.
>
> I wonder why nothing blew up or failed strangely by now...

The hardware that cared was pretty broken to begin with, and I think
it was mainly some really odd graphics cards.

And from memory, they had issues with 64-bit writes. We actually have
a slow 16-bit word at a time copy for exactly these kinds of issues:
scr_memcpyw() and friends.

I'd like to say that it was one of those shit server-only cards that
nobody sane would ever use (but "server hardware is validated and
better quality!"), but that might have been another issue.

>> For example, "rep movsb" really is the right thing to use on normal
>> memory on modern CPU's.
>
> So Logan's box is a SNB and it doesn't have the ERMS optimizations. Are
> you saying, regardless, we should let gcc put REP; MOVSB for smaller
> sizes?

I think gcc makes bad choices, but they are gcc';s choices to make.

I have up on gcc's "-Os" because the choices were so bad and not getting fixed.

.. and none of that has _anything_ to do with accesses to IO memory,
which is fundamentally different.


> Because gcc does generate a REP; MOVSB there when it puts its own
> memcpy, see mail upthread. (Even though that is wrong to do on iomem.)

No, *THAT* is not wrong to do on iomem. If we tell gcc that "memcpy()"
works on iomem, then gcc can damn well do whatever it wants.

"rep stosb" isn't wrong for memcpy(). Gcc may do stupid things with
it, but that's completely immaterial.

> Oh, and along with the revert we would need a big fat warning explaining
> why we need that special memcpy for IO memory.

Well, quite frankly, just a simple "IO memory is different from cached
memory" should be sufficient.

Linus