Re: sse for fast_clear_page()?

From: Manfred Spraul (
Date: Thu Feb 08 2001 - 13:36:34 EST

Arjan van de Ven wrote:
> In article <> you wrote:
> > fast_clear_page() uses mmx instructions for clearing a page, what about
> > using sse instructions?
> > sse instructions can store 128 bit in one instruction, mmx only 64 bit.
> the sse FP registers might be lossy.

I thought that too, thus I only implemented memset(,0,) with sse.

But then I found this document on Intel website:

Intel recommends using sse registers for generic memcopy - they can't be

> On my athlon, the in-kernel mmx
> functions are memory-bound (eg > 1 Gbyte/sec throughput)
You are using an Athlon with SDRAM?

A Pentium 4 has a 3.2 GB memory bus and I saw a benchmark that compared
mmx and sse memmove, and sse was _much_ faster.

I've implemented a user space sse copy_page, and:

* mmx is the slowest version! (~12000 cpu ticks/page).
* movsd is slightly faster (~11850 cpu ticks/page). Probably due to the
special 'rep movsd' optimization in the Pentium III (cpu notices that
ecx is large and switches to a cache line copy mode).
* sse is the fastest version (~ 11500 cpu ticks/page)

Everything with cold caches.

> Userspace program for the athlon code:
I'll check it.

