« Return to Thread: k10h post-BIOS patch effects

Re: k10h post-BIOS patch effects

by dean gaudet-2 :: Rate this Message:

Reply to Author | View in Thread

on a phenom:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9600 Quad-Core Processor
stepping        : 2
cpu MHz         : 2306.997

in a M3A32-MVP DELUXE mobo ... whose bios info i can describe only as:

        Vendor: American Megatrends Inc.
        Version: 0801

(based on dmidecode)

it's running ubuntu feisty server (and powernow/etc aren't loaded)

i get the following results.

-dean


*******************************************************************************
*******************************************************************************
*******************************************************************************
*       BEGAN ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 09:50     *
*******************************************************************************
*******************************************************************************
*******************************************************************************





IN STAGE 1 INSTALL:  SYSTEM PROBE/AUX COMPILE
   Level 1 cache size calculated as 64KB.

   dFPU: Separate multiply and add instructions with 4 cycle pipeline.
         Apparent number of registers : 13
         Register-register performance=4511.70MFLOPS
   sFPU: Separate multiply and add instructions with 4 cycle pipeline.
         Apparent number of registers : 13
         Register-register performance=4511.70MFLOPS


IN STAGE 2 INSTALL:  TYPE-DEPENDENT TUNING


STAGE 2-1: TUNING PREC='d' (precision 1 of 4)


   STAGE 2-1-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_dmm8x1x120_L1pf.c, NB=40, written by R. Clint Whaley
      Performance: 8057.51MFLOPS (349.42 percent of of detected clock rate)
        (Gen case got 3928.97MFLOPS)
      mmNN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3839.81 (47.66 of copy matmul, 166.51 of clock)
      mmNT   : ma=0, lat=6, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3291.98 (40.86 of copy matmul, 142.76 of clock)
      mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3799.57 (47.16 of copy matmul, 164.77 of clock)
      mmTT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3296.25 (40.91 of copy matmul, 142.94 of clock)



   STAGE 2-1-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes


   STAGE 2-1-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-1-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-1-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-1-5: GEMV TUNE
      gemvN : chose routine 3:ATL_gemvN_1x1_1a.c written by R. Clint Whaley
              Yunroll=32, Xunroll=1, using 100 percent of L1
              Performance = 1394.39 (17.31 of copy matmul, 60.47 of clock)
      gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
              Yunroll=2, Xunroll=16, using 100 percent of L1
              Performance = 1374.56 (17.06 of copy matmul, 59.61 of clock)


   STAGE 2-1-6: GER TUNE
      ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  0.51 percent of L1 Cache
              Performance = 809.66 (10.05 of copy matmul, 35.11 of clock)


STAGE 2-2: TUNING PREC='s' (precision 2 of 4)


   STAGE 2-2-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
      Performance: 15012.39MFLOPS (651.01 percent of of detected clock rate)
        (Gen case got 4435.46MFLOPS)
      mmNN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3834.01 (25.54 of copy matmul, 166.26 of clock)
      mmNT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3370.67 (22.45 of copy matmul, 146.17 of clock)
      mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3959.61 (26.38 of copy matmul, 171.71 of clock)
      mmTT   : ma=0, lat=3, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3486.42 (23.22 of copy matmul, 151.19 of clock)



   STAGE 2-2-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes


   STAGE 2-2-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-2-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-2-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-2-5: GEMV TUNE
      gemvN : chose routine 9:ATL_gemvN_32x4_1.c written by R. Clint Whaley
              Yunroll=32, Xunroll=4, using 100 percent of L1
              Performance = 1761.79 (11.74 of copy matmul, 76.40 of clock)
      gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
              Yunroll=2, Xunroll=16, using 100 percent of L1
              Performance = 1984.77 (13.22 of copy matmul, 86.07 of clock)


   STAGE 2-2-6: GER TUNE
      ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  1.00 percent of L1 Cache
              Performance = 1323.34 ( 8.81 of copy matmul, 57.39 of clock)


STAGE 2-3: TUNING PREC='z' (precision 3 of 4)


   STAGE 2-3-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_dmm14x1x56_sse2pABC.c, NB=56, written by R. Clint Whaley
      Performance: 7856.61MFLOPS (340.70 percent of of detected clock rate)
        (Gen case got 4166.97MFLOPS)
      mmNN   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3946.90 (50.24 of copy matmul, 171.16 of clock)
      mmNT   : ma=0, lat=8, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3589.62 (45.69 of copy matmul, 155.66 of clock)
      mmTN   : ma=0, lat=2, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3959.21 (50.39 of copy matmul, 171.69 of clock)
      mmTT   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3599.80 (45.82 of copy matmul, 156.11 of clock)



   STAGE 2-3-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes
      zdNKB set to 0 bytes


   STAGE 2-3-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-3-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-3-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-3-5: GEMV TUNE
      gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
              Yunroll=32, Xunroll=1, using 99 percent of L1
              Performance = 2835.03 (36.08 of copy matmul, 122.94 of clock)
      gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
              Yunroll=2, Xunroll=8, using 99 percent of L1
              Performance = 2116.02 (26.93 of copy matmul, 91.76 of clock)


   STAGE 2-3-6: GER TUNE
      ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  0.76 percent of L1 Cache
              Performance = 1609.07 (20.48 of copy matmul, 69.78 of clock)


STAGE 2-4: TUNING PREC='c' (precision 4 of 4)


   STAGE 2-4-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
      Performance: 14625.27MFLOPS (634.23 percent of of detected clock rate)
        (Gen case got 4415.67MFLOPS)
      mmNN   : ma=0, lat=8, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3934.66 (26.90 of copy matmul, 170.63 of clock)
      mmNT   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3615.18 (24.72 of copy matmul, 156.77 of clock)
      mmTN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3953.05 (27.03 of copy matmul, 171.42 of clock)
      mmTT   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3678.05 (25.15 of copy matmul, 159.50 of clock)



   STAGE 2-4-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes
      csNKB set to 0 bytes


   STAGE 2-4-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-4-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-4-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-4-5: GEMV TUNE
      gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
              Yunroll=32, Xunroll=1, using 86 percent of L1
              Performance = 5542.02 (37.89 of copy matmul, 240.33 of clock)
      gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
              Yunroll=2, Xunroll=8, using 86 percent of L1
              Performance = 2548.30 (17.42 of copy matmul, 110.51 of clock)


   STAGE 2-4-6: GER TUNE
      ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  0.75 percent of L1 Cache
              Performance = 3173.71 (21.70 of copy matmul, 137.63 of clock)


STAGE 3: GENERAL LIBRARY BUILD


STAGE 4: POST-BUILD TUNING
   done.


STAGE 4-2: Threading install
   done.

*******************************************************************************
*******************************************************************************
*******************************************************************************
*      FINISHED ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 10:02   *
*******************************************************************************
*******************************************************************************
*******************************************************************************




-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

 « Return to Thread: k10h post-BIOS patch effects