Re: [PATCH] Use x86 SSE instructions for clear_page, copy_page

From: Jens Maurer
Date: Tue Aug 17 2004 - 17:42:18 EST



Arjan van de Ven wrote:
On Tue, 2004-08-17 at 08:13, Jens Maurer wrote:

The attached patch (against kernel 2.6.8.1) enables using SSE
instructions for copy_page and clear_page.

we used to have code like this in 2.4 but it got removed: the non
temperal store code is faster in a microbenchmark but has the
fundamental problem that it evics the data from the cpu cache; the
actual USE of the data thus is a LOT more expensive, result is that the
overall system performance goes down ;(

Hm... With the current clear_page, we are filling 4KB of my
Pentium-III's 16 KB L1 d-cache (i.e. 25%) with zeroes. I'm not
sure that we will use all of this data right away.

I would like to point out that the current arch/i386/lib/mmx.c
uses MMX movntq instructions #ifdef CONFIG_MK7 .
Apparently, bypassing the cache was considered a good idea
in that case.

What is a set of useful benchmarks to find out which approach
is better? We should have some real-world programs that show
significant oprofile hits in clear_page or copy_page.
It might very well be that the results on Pentium-III and
Pentium-4 are different, for example that SSE is only useful
for a Pentium-III, and only for clear_page.

Jens Maurer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/