Re: Dual athlon support?

From: Dieter Nützel (
Date: Thu Aug 03 2000 - 12:45:37 EST

D Milburn wrote:

> On Thu, 3 Aug 2000, Dieter =?iso-8859-1?Q?N=FCtzel?= wrote:
> > The AMD 760 chip set is UP only but has DDR-SDRAM support.
> Yeah, I realised my slip-up after I'd sent it :-/
> > gcc -O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
> > - -fschedule-insns2 -fexpensive-optimizations -DMAIN -o dgemm dgemm.c
> >
> > NOTE the k6 flag!!!
> >
> > m:1000 n:1000 k:1000
> > Ail_max 24, Blj_max 12, A_row_block 85
> > Shimizu's DGEMM : 260.417 MFLOPS( 7.680 sec)
> > Shimizu's DGEMM : 260.417 MFLOPS( 7.680 sec)
> > Shimizu's DGEMM : 260.417 MFLOPS( 7.680 sec)
> >
> > Not bad I think.
> I'm unfamiliar with this benchmark... can you tell me where to get it
> please?

It is courtesy of QuantX (
I found it on a 'Cool stuff for Linux 1.5' QuantX/Samsung Semiconductor CD
for there nice Alpha's.

It is for comparison of FPU performance. They have some numbers for
different CPU's packed with it.

                                                Last updated date :

Copyright 1998 by Naohiko Shimizu <>

All rights reserved.

The blocking algorithm and strategy for the blocking factor is my
original. If you want to use the same algorithm or to use the same
strategy for the blocking factor, you must preserve the copyright
notice to the all programs and products which is derivable from this
program and/or description.

Contact information:

Dr. Naohiko Shimizu
School of Engr. Tokai Univ.
1117 Kitakaname Hiratsuka
Kanagawa 259-1292 Japan

I have the source. It is really small so I can post it if you like.

> Are those the optimum flags for the K6 and Athlon,

I don't know and would more than happy if some AMD stuff could shine some
light on it!!!
If you compare with the 'latest' announced 'SPEC world record for x86
FPU's' (SPECfp_base2000 value 290 for there 1 GHz Thunderbird with there
beta Compaq-Fortran-Compiler, version 6.5) it looks good.
The peak was 302.

Note: The two benchs are somewhat different.

But I found the following:

> I found out that the Athlon can reach a ~33% speed up with
> '-O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
> -fschedule-insns2 -fexpensive-optimizations' on some (mixed) FPU
> intensive apps. I use this for Mesa CVS all the time.
> Yes, only 'O' and 'k6' which is curios...
> I've tested gcc-2.96 CVS which has a special 'athlon' optimization flag,
> too. It didn't come close!
> and do I need gcc-2.95.2 or higher to support the -mcpu=k6 (sorry,
> trivial questions)?

Yes, it was introduced first with pgcc but I found some problems with pgcc
and FPU (Mesa/DRI) stuff.


BTW I do my thesis in the field of 3D medical image analysis/visualization.

Dieter Nützel
Graduate Student, Computer Science

University of Hamburg Department of Computer Science Cognitive Systems Group Vogt-Kölln-Straße 30 D-22527 Hamburg, Germany

email: @home:

