Re: [PATCH] add a clear_pages function to clear pages of higher order

From: Denis Vlasenko
Date: Mon Mar 21 2005 - 10:34:30 EST


On Friday 18 March 2005 21:28, Andi Kleen wrote:
> On Fri, Mar 18, 2005 at 07:00:06AM -0800, Christoph Lameter wrote:
> > On Fri, 18 Mar 2005, Denis Vlasenko wrote:
> >
> > > NT stores are not about 5% increase. 200%-300%. Provided you are ok with
> > > the fact that zeroed page ends up evicted from cache. Luckily, this is exactly
> > > what you want with prezeroing.
> >
> > These are pretty significant results. Maybe its best to use non-temporal
>
> The differences are actually less. I do not know what Denis benchmarked,
> but in my tests the difference was never more than ~10%. He got a zero
> too much?

No. See attached.

# gcc -O2 0main.c
# ./a.out
Page clear/copy benchmark program.
buffer size: 1 Mb
Each test tried 64 times, max and min CPU cycles per page are reported.
Please disregard max values. They are due to system interference only.
clear_page() tests:
normal_clear_page - took 44214 max,12615 min cycles per page
normal_clear_page - took 18969 max,12649 min cycles per page
repstosl_clear_page - took 19897 max,12655 min cycles per page
movq_clear_page - took 39391 max,10782 min cycles per page
movntq_clear_page - took 21612 max, 4779 min cycles per page

copy_page() tests:
....

I'm basically saying that 'microbenchmark-visible'
performance of NT stores is 200-300% higher than 'normal' stores.

BTW: cache eviction is not an intrisic property of non-temporal
stores. It's merely how they're implemented in current CPUs:
if NT stores hit cached line, invalidate it and
push stores to bus. Else just push stores to bus
without reading cacheline from RAM first.

It is possible that some future CPU won't evict cacheline
if NT stores happened to hit it: "if NT stores hit cached line,
MODIFY it and push stores to bus".
--
vda

Attachment: page_asm.tar.bz2
Description: application/tbz