on a phenom:
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9600 Quad-Core Processor
stepping : 2
cpu MHz : 2306.997
in a M3A32-MVP DELUXE mobo ... whose bios info i can describe only as:
Vendor: American Megatrends Inc.
Version: 0801
(based on dmidecode)
it's running ubuntu feisty server (and powernow/etc aren't loaded)
i get the following results.
-dean
*******************************************************************************
*******************************************************************************
*******************************************************************************
* BEGAN ATLAS3.9.0 INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 09:50 *
*******************************************************************************
*******************************************************************************
*******************************************************************************
IN STAGE 1 INSTALL: SYSTEM PROBE/AUX COMPILE
Level 1 cache size calculated as 64KB.
dFPU: Separate multiply and add instructions with 4 cycle pipeline.
Apparent number of registers : 13
Register-register performance=4511.70MFLOPS
sFPU: Separate multiply and add instructions with 4 cycle pipeline.
Apparent number of registers : 13
Register-register performance=4511.70MFLOPS
IN STAGE 2 INSTALL: TYPE-DEPENDENT TUNING
STAGE 2-1: TUNING PREC='d' (precision 1 of 4)
STAGE 2-1-1 : BUILDING BLOCK MATMUL TUNE
The best matmul kernel was ATL_dmm8x1x120_L1pf.c, NB=40, written by R. Clint Whaley
Performance: 8057.51MFLOPS (349.42 percent of of detected clock rate)
(Gen case got 3928.97MFLOPS)
mmNN : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
Performance = 3839.81 (47.66 of copy matmul, 166.51 of clock)
mmNT : ma=0, lat=6, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
Performance = 3291.98 (40.86 of copy matmul, 142.76 of clock)
mmTN : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
Performance = 3799.57 (47.16 of copy matmul, 164.77 of clock)
mmTT : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
Performance = 3296.25 (40.91 of copy matmul, 142.94 of clock)
STAGE 2-1-2: CacheEdge DETECTION
CacheEdge set to 3145728 bytes
STAGE 2-1-3: LARGE/SMALL CASE CROSSOVER DETECTION
STAGE 2-1-3: COPY/NO-COPY CROSSOVER DETECTION
done.
STAGE 2-1-4: LEVEL 3 BLAS TUNE
done.
STAGE 2-1-5: GEMV TUNE
gemvN : chose routine 3:ATL_gemvN_1x1_1a.c written by R. Clint Whaley
Yunroll=32, Xunroll=1, using 100 percent of L1
Performance = 1394.39 (17.31 of copy matmul, 60.47 of clock)
gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
Yunroll=2, Xunroll=16, using 100 percent of L1
Performance = 1374.56 (17.06 of copy matmul, 59.61 of clock)
STAGE 2-1-6: GER TUNE
ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
mu=16, nu=1, using 0.51 percent of L1 Cache
Performance = 809.66 (10.05 of copy matmul, 35.11 of clock)
STAGE 2-2: TUNING PREC='s' (precision 2 of 4)
STAGE 2-2-1 : BUILDING BLOCK MATMUL TUNE
The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
Performance: 15012.39MFLOPS (651.01 percent of of detected clock rate)
(Gen case got 4435.46MFLOPS)
mmNN : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3834.01 (25.54 of copy matmul, 166.26 of clock)
mmNT : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3370.67 (22.45 of copy matmul, 146.17 of clock)
mmTN : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3959.61 (26.38 of copy matmul, 171.71 of clock)
mmTT : ma=0, lat=3, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3486.42 (23.22 of copy matmul, 151.19 of clock)
STAGE 2-2-2: CacheEdge DETECTION
CacheEdge set to 3145728 bytes
STAGE 2-2-3: LARGE/SMALL CASE CROSSOVER DETECTION
STAGE 2-2-3: COPY/NO-COPY CROSSOVER DETECTION
done.
STAGE 2-2-4: LEVEL 3 BLAS TUNE
done.
STAGE 2-2-5: GEMV TUNE
gemvN : chose routine 9:ATL_gemvN_32x4_1.c written by R. Clint Whaley
Yunroll=32, Xunroll=4, using 100 percent of L1
Performance = 1761.79 (11.74 of copy matmul, 76.40 of clock)
gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
Yunroll=2, Xunroll=16, using 100 percent of L1
Performance = 1984.77 (13.22 of copy matmul, 86.07 of clock)
STAGE 2-2-6: GER TUNE
ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
mu=16, nu=1, using 1.00 percent of L1 Cache
Performance = 1323.34 ( 8.81 of copy matmul, 57.39 of clock)
STAGE 2-3: TUNING PREC='z' (precision 3 of 4)
STAGE 2-3-1 : BUILDING BLOCK MATMUL TUNE
The best matmul kernel was ATL_dmm14x1x56_sse2pABC.c, NB=56, written by R. Clint Whaley
Performance: 7856.61MFLOPS (340.70 percent of of detected clock rate)
(Gen case got 4166.97MFLOPS)
mmNN : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
Performance = 3946.90 (50.24 of copy matmul, 171.16 of clock)
mmNT : ma=0, lat=8, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
Performance = 3589.62 (45.69 of copy matmul, 155.66 of clock)
mmTN : ma=0, lat=2, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
Performance = 3959.21 (50.39 of copy matmul, 171.69 of clock)
mmTT : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
Performance = 3599.80 (45.82 of copy matmul, 156.11 of clock)
STAGE 2-3-2: CacheEdge DETECTION
CacheEdge set to 3145728 bytes
zdNKB set to 0 bytes
STAGE 2-3-3: LARGE/SMALL CASE CROSSOVER DETECTION
STAGE 2-3-3: COPY/NO-COPY CROSSOVER DETECTION
done.
STAGE 2-3-4: LEVEL 3 BLAS TUNE
done.
STAGE 2-3-5: GEMV TUNE
gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
Yunroll=32, Xunroll=1, using 99 percent of L1
Performance = 2835.03 (36.08 of copy matmul, 122.94 of clock)
gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
Yunroll=2, Xunroll=8, using 99 percent of L1
Performance = 2116.02 (26.93 of copy matmul, 91.76 of clock)
STAGE 2-3-6: GER TUNE
ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
mu=16, nu=1, using 0.76 percent of L1 Cache
Performance = 1609.07 (20.48 of copy matmul, 69.78 of clock)
STAGE 2-4: TUNING PREC='c' (precision 4 of 4)
STAGE 2-4-1 : BUILDING BLOCK MATMUL TUNE
The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
Performance: 14625.27MFLOPS (634.23 percent of of detected clock rate)
(Gen case got 4415.67MFLOPS)
mmNN : ma=0, lat=8, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3934.66 (26.90 of copy matmul, 170.63 of clock)
mmNT : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3615.18 (24.72 of copy matmul, 156.77 of clock)
mmTN : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3953.05 (27.03 of copy matmul, 171.42 of clock)
mmTT : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
Performance = 3678.05 (25.15 of copy matmul, 159.50 of clock)
STAGE 2-4-2: CacheEdge DETECTION
CacheEdge set to 3145728 bytes
csNKB set to 0 bytes
STAGE 2-4-3: LARGE/SMALL CASE CROSSOVER DETECTION
STAGE 2-4-3: COPY/NO-COPY CROSSOVER DETECTION
done.
STAGE 2-4-4: LEVEL 3 BLAS TUNE
done.
STAGE 2-4-5: GEMV TUNE
gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
Yunroll=32, Xunroll=1, using 86 percent of L1
Performance = 5542.02 (37.89 of copy matmul, 240.33 of clock)
gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
Yunroll=2, Xunroll=8, using 86 percent of L1
Performance = 2548.30 (17.42 of copy matmul, 110.51 of clock)
STAGE 2-4-6: GER TUNE
ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
mu=16, nu=1, using 0.75 percent of L1 Cache
Performance = 3173.71 (21.70 of copy matmul, 137.63 of clock)
STAGE 3: GENERAL LIBRARY BUILD
STAGE 4: POST-BUILD TUNING
done.
STAGE 4-2: Threading install
done.
*******************************************************************************
*******************************************************************************
*******************************************************************************
* FINISHED ATLAS3.9.0 INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 10:02 *
*******************************************************************************
*******************************************************************************
*******************************************************************************
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel