« Return to Thread: k10h post-BIOS patch effects

Re: k10h post-BIOS patch effects

by dean gaudet-2 :: Rate this Message:

Reply to Author | View in Thread

btw i should add:

# for cpu in `awk '/^processor/ {print $3}' /proc/cpuinfo`; do (echo $cpu;
rdmsr $cpu 0xc0010015; rdmsr $cpu 0xc0011023) | fmt -w1000; done
0 0x0000000001000010 0x0000000000200020
1 0x0000000001000010 0x0000000000200020
2 0x0000000001000010 0x0000000000200020
3 0x0000000001000010 0x0000000000200020

so neither errata 298 nor 309 were enabled...  and this is a B2 part...

# setpci -d 1022:1204 64.l
00000000

nor was HTC.

-dean


On Sat, 19 Jul 2008, dean gaudet wrote:

> on a phenom:
>
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 16
> model           : 2
> model name      : AMD Phenom(tm) 9600 Quad-Core Processor
> stepping        : 2
> cpu MHz         : 2306.997
>
> in a M3A32-MVP DELUXE mobo ... whose bios info i can describe only as:
>
>         Vendor: American Megatrends Inc.
>         Version: 0801
>
> (based on dmidecode)
>
> it's running ubuntu feisty server (and powernow/etc aren't loaded)
>
> i get the following results.
>
> -dean
>
>
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
> *       BEGAN ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 09:50     *
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
>
>
>
>
>
> IN STAGE 1 INSTALL:  SYSTEM PROBE/AUX COMPILE
>    Level 1 cache size calculated as 64KB.
>
>    dFPU: Separate multiply and add instructions with 4 cycle pipeline.
>          Apparent number of registers : 13
>          Register-register performance=4511.70MFLOPS
>    sFPU: Separate multiply and add instructions with 4 cycle pipeline.
>          Apparent number of registers : 13
>          Register-register performance=4511.70MFLOPS
>
>
> IN STAGE 2 INSTALL:  TYPE-DEPENDENT TUNING
>
>
> STAGE 2-1: TUNING PREC='d' (precision 1 of 4)
>
>
>    STAGE 2-1-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_dmm8x1x120_L1pf.c, NB=40, written by R. Clint Whaley
>       Performance: 8057.51MFLOPS (349.42 percent of of detected clock rate)
>         (Gen case got 3928.97MFLOPS)
>       mmNN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3839.81 (47.66 of copy matmul, 166.51 of clock)
>       mmNT   : ma=0, lat=6, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3291.98 (40.86 of copy matmul, 142.76 of clock)
>       mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3799.57 (47.16 of copy matmul, 164.77 of clock)
>       mmTT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3296.25 (40.91 of copy matmul, 142.94 of clock)
>
>
>
>    STAGE 2-1-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>
>
>    STAGE 2-1-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-1-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-1-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-1-5: GEMV TUNE
>       gemvN : chose routine 3:ATL_gemvN_1x1_1a.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=1, using 100 percent of L1
>               Performance = 1394.39 (17.31 of copy matmul, 60.47 of clock)
>       gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=16, using 100 percent of L1
>               Performance = 1374.56 (17.06 of copy matmul, 59.61 of clock)
>
>
>    STAGE 2-1-6: GER TUNE
>       ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  0.51 percent of L1 Cache
>               Performance = 809.66 (10.05 of copy matmul, 35.11 of clock)
>
>
> STAGE 2-2: TUNING PREC='s' (precision 2 of 4)
>
>
>    STAGE 2-2-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
>       Performance: 15012.39MFLOPS (651.01 percent of of detected clock rate)
>         (Gen case got 4435.46MFLOPS)
>       mmNN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3834.01 (25.54 of copy matmul, 166.26 of clock)
>       mmNT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3370.67 (22.45 of copy matmul, 146.17 of clock)
>       mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3959.61 (26.38 of copy matmul, 171.71 of clock)
>       mmTT   : ma=0, lat=3, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3486.42 (23.22 of copy matmul, 151.19 of clock)
>
>
>
>    STAGE 2-2-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>
>
>    STAGE 2-2-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-2-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-2-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-2-5: GEMV TUNE
>       gemvN : chose routine 9:ATL_gemvN_32x4_1.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=4, using 100 percent of L1
>               Performance = 1761.79 (11.74 of copy matmul, 76.40 of clock)
>       gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=16, using 100 percent of L1
>               Performance = 1984.77 (13.22 of copy matmul, 86.07 of clock)
>
>
>    STAGE 2-2-6: GER TUNE
>       ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  1.00 percent of L1 Cache
>               Performance = 1323.34 ( 8.81 of copy matmul, 57.39 of clock)
>
>
> STAGE 2-3: TUNING PREC='z' (precision 3 of 4)
>
>
>    STAGE 2-3-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_dmm14x1x56_sse2pABC.c, NB=56, written by R. Clint Whaley
>       Performance: 7856.61MFLOPS (340.70 percent of of detected clock rate)
>         (Gen case got 4166.97MFLOPS)
>       mmNN   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3946.90 (50.24 of copy matmul, 171.16 of clock)
>       mmNT   : ma=0, lat=8, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3589.62 (45.69 of copy matmul, 155.66 of clock)
>       mmTN   : ma=0, lat=2, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3959.21 (50.39 of copy matmul, 171.69 of clock)
>       mmTT   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3599.80 (45.82 of copy matmul, 156.11 of clock)
>
>
>
>    STAGE 2-3-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>       zdNKB set to 0 bytes
>
>
>    STAGE 2-3-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-3-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-3-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-3-5: GEMV TUNE
>       gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=1, using 99 percent of L1
>               Performance = 2835.03 (36.08 of copy matmul, 122.94 of clock)
>       gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=8, using 99 percent of L1
>               Performance = 2116.02 (26.93 of copy matmul, 91.76 of clock)
>
>
>    STAGE 2-3-6: GER TUNE
>       ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  0.76 percent of L1 Cache
>               Performance = 1609.07 (20.48 of copy matmul, 69.78 of clock)
>
>
> STAGE 2-4: TUNING PREC='c' (precision 4 of 4)
>
>
>    STAGE 2-4-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
>       Performance: 14625.27MFLOPS (634.23 percent of of detected clock rate)
>         (Gen case got 4415.67MFLOPS)
>       mmNN   : ma=0, lat=8, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3934.66 (26.90 of copy matmul, 170.63 of clock)
>       mmNT   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3615.18 (24.72 of copy matmul, 156.77 of clock)
>       mmTN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3953.05 (27.03 of copy matmul, 171.42 of clock)
>       mmTT   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3678.05 (25.15 of copy matmul, 159.50 of clock)
>
>
>
>    STAGE 2-4-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>       csNKB set to 0 bytes
>
>
>    STAGE 2-4-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-4-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-4-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-4-5: GEMV TUNE
>       gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=1, using 86 percent of L1
>               Performance = 5542.02 (37.89 of copy matmul, 240.33 of clock)
>       gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=8, using 86 percent of L1
>               Performance = 2548.30 (17.42 of copy matmul, 110.51 of clock)
>
>
>    STAGE 2-4-6: GER TUNE
>       ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  0.75 percent of L1 Cache
>               Performance = 3173.71 (21.70 of copy matmul, 137.63 of clock)
>
>
> STAGE 3: GENERAL LIBRARY BUILD
>
>
> STAGE 4: POST-BUILD TUNING
>    done.
>
>
> STAGE 4-2: Threading install
>    done.
>
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
> *      FINISHED ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 10:02   *
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
>
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Math-atlas-devel mailing list
> Math-atlas-devel@...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

 « Return to Thread: k10h post-BIOS patch effects