Re: [PATCH] Use x86 SSE instructions for clear_page, copy_page

From: Arjan van de Ven
Date: Tue Aug 17 2004 - 03:18:09 EST


On Tue, Aug 17, 2004 at 12:10:09PM +0400, Andrey Panin wrote:
> On 230, 08 17, 2004 at 09:27:51AM +0200, Arjan van de Ven wrote:
> > On Tue, 2004-08-17 at 08:13, Jens Maurer wrote:
> > > The attached patch (against kernel 2.6.8.1) enables using SSE
> > > instructions for copy_page and clear_page.
> > >
> > > A user-space test on my Pentium III 850 MHz shows a 3x speedup for
> > > clear_page (compared to the default "rep stosl"), and a 50% speedup
> > > for copy_page (compared to the default "rep movsl"). For a Pentium-4,
> > > the speedup is about 50% in both the clear_page and copy_page cases.
> >
> >
> > we used to have code like this in 2.4 but it got removed: the non
> > temperal store code is faster in a microbenchmark but has the
> > fundamental problem that it evics the data from the cpu cache; the
> > actual USE of the data thus is a LOT more expensive, result is that the
> > overall system performance goes down ;(
>
> Did SSE clear_page() suffered from this issue too ?

yes especially clear_page; the kernel only calls clear_page when it's *just
about* to use it, so it's actually the worst case example ;(

(and clear_page gains the most because non-temperal stores avoid the write
allocate so it like halves the memory bandwidth)

Attachment: pgp00000.pgp
Description: PGP signature