k10h post-BIOS patch effects

View: New views
13 Messages — Rating Filter:   Alert me  

k10h post-BIOS patch effects

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Guys,

One of the things due for 3.9 is new kernels for Core2Duo and AMD
Phenom/3rdgenOpteron.  Since my Phenom continues to randomly slow down
to half speed, I recently returned to the OpteronK10h for my first timings
since applying the BIOS patch to fix the k10h TLB errata.  It is not pretty!

My timings indicate as much as a 20% drop in performance in **GEMM**, which
is not really dominated by memory costs.  Level 2 performance dropped
something more like 30%.  The funny part was the AMD guy reassured me the
performance effects had been way overblown by fringe internet weirdos . . .

Anyway, the good news is that the machine no longer crashes once a day
(apparently a large threaded DGEMM is just as good as virtualization at
crashing a processor with this errata), the bad news is that the performance
is horrible.  I now own two AMD K10h, neither of which I can trust for tuning.
Fabulous.

Cheers,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by Mikhail Kuzminsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message from Clint Whaley <whaley@...> (Wed, 16 Jul 2008
09:34:48 -0500):
>Guys,
>
>One of the things due for 3.9 is new kernels for Core2Duo and AMD
>Phenom/3rdgenOpteron.  Since my Phenom continues to randomly slow
>down
>to half speed,
Which Linux distro do you use ? You should kill all the
"power-sensual" daemons (like powersaved in SuSE) and remove the
corresponding kernel daemons like cpufreq.

> I recently returned to the OpteronK10h for my first
>timings
>since applying the BIOS patch to fix the k10h TLB errata.  It is not
>pretty!
>
>My timings indicate as much as a 20% drop in performance in **GEMM**,
>which
>is not really dominated by memory costs.  Level 2 performance dropped
>something more like 30%.  The funny part was the AMD guy reassured me
>the
>performance effects had been way overblown by fringe internet weirdos
>. . .

BIOS patch was declared as leading to some performance degradation.
The better choice is to patch Linux kernel (the patch was published on
AMD x86-64 electronic conference) - it must give minor performance
decrease.

Yours
Mikhail Kuzminsky
Computer Assistance to Chemical Research Center
Zelinsky Institute of Organic Chemistry
Moscow

>
>Anyway, the good news is that the machine no longer crashes once a
>day
>(apparently a large threaded DGEMM is just as good as virtualization
>at
>crashing a processor with this errata), the bad news is that the
>performance
>is horrible.  I now own two AMD K10h, neither of which I can trust
>for tuning.
>Fabulous.
>
>Cheers,
>Clint
>
>**************************************************************************
>** R. Clint Whaley, PhD ** Assist Prof, UTSA **
>www.cs.utsa.edu/~whaley **
>**************************************************************************
>
>-------------------------------------------------------------------------
>This SF.Net email is sponsored by the Moblin Your Move Developer's
>challenge
>Build the coolest Linux based applications with Moblin SDK & win
>great prizes
>Grand prize is a trip for two to an Open Source event anywhere in the
>world
>http://moblin-contest.org/redirect.php?banner_id=100&url=/
>_______________________________________________
>Math-atlas-devel mailing list
>Math-atlas-devel@...
>https://lists.sourceforge.net/lists/listinfo/math-atlas-devel


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Guys,

>Which Linux distro do you use ? You should kill all the
>"power-sensual" daemons (like powersaved in SuSE) and remove the
>corresponding kernel daemons like cpufreq.

I tried two linux distros: kubuntu hardy heron & Fedora Core 9.  FC9 does
it a *lot* less than kubuntu, but it still does it.  I have turned off
"cool & quiet" in the bios, and cpuinfo shows full speed even as my
timings drop by half.  I verified that cpufreq doesn't work after the BIOS
turnoff (the scaling directiries are missing from ACPI).

I more & more suspect the problem is in the motherboard.  Dean (I think it was)
mentioned that thermal throttling is broken in the Phenom; I wonder if
the mobo assumes it works and does some voltage things it can't handle
in response to OS calls.

Whatever it is, it is affected by OS, so it is not pure hardware.  But,
I wonder if the OS sends some signal that the mobo should ignore,
but instead attempts something the k10h can't do . . .

Anyway, if anyone can tell me OS & mobo combinations that they have seen
work for the Phenom, I'd appreciate it.

>BIOS patch was declared as leading to some performance degradation.

Yeah, but I did not expect that massive die-off for a cache-dominated
algorithm like GEMM.  For HPC, the slowdown is massive and pervasive,
but the TLB bug is triggered daily.

>The better choice is to patch Linux kernel (the patch was published on
>AMD x86-64 electronic conference) - it must give minor performance
>decrease.

Last time I checked, this was not in the standard linux kernel, or even
a supporte patch, but just some example code on some mailing list, where
the AMD guy says, "I wouldn't use this if I were you" . . .

Cheers,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: atlas-3.9.0

by Mikhail Kuzminsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I just installed 3.9.0 version on Opteron 2350,
but "make ptcheck"
gives me:
gfortran -fomit-frame-pointer -mfpmath=sse -msse3 -O2 -falign-loops=32
-m64 -o xsinvtst_pt sinvtst_pt.o \
                    /usr/local/atlas_390_opteron235x/lib/libtstatlas.a
/usr/local/atlas_390_opteron235x/lib/liblapack.a
/usr/local/atlas_390_opteron235x/lib/libptcblas.a
/usr/local/atlas_390_opteron235x/lib/libptf77blas.a \
                    /usr/local/atlas_390_opteron235x/lib/libatlas.a
-lpthread -lm
/usr/local/atlas_390_opteron235x/lib/libatlas.a(ATL_ptflushcache.o):
In function `ATL_ptFlushAreasByCL':
ATL_ptflushcache.c:(.text+0xc8): undefined reference to
`ATL_FlushAreaByCL'
collect2: ld returned 1 exit status
make[3]: *** [xsinvtst_pt] Error 1
make[3]: Leaving directory `/usr/local/atlas_390_opteron235x/bin'
make[2]: *** [ptsanity_test] Error 2
make[2]: Leaving directory `/usr/local/atlas_390_opteron235x/bin'
make[1]: *** [ptsanity_test] Error 2
make[1]: Leaving directory `/usr/local/atlas_390_opteron235x'
make: *** [pttest] Error 2

BTW, what is default prefix value for configure-
/usr/local/atlas or /usr/local/ATLAS ?

Mikhail Kuzminsky
Computer Assistance to Chemical Research Center
Zelinsky Institute of Organic Chemistry
Moscow


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: atlas-3.9.0

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>but "make ptcheck"
>gives me:
>ATL_ptflushcache.c:(.text+0xc8): undefined reference to
>`ATL_FlushAreaByCL'

Had two errors in this routine.  See:
   https://sourceforge.net/tracker/index.php?func=detail&aid=2021878&group_id=23725&atid=379482

>BTW, what is default prefix value for configure-
>/usr/local/atlas or /usr/local/ATLAS ?

/usr/local/atlas

Cheers,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: atlas-3.9.0

by Mikhail Kuzminsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message from Clint Whaley <whaley@...> (Fri, 18 Jul 2008
17:20:58 -0500):
>>BTW, what is default prefix value for configure-
>>/usr/local/atlas or /usr/local/ATLAS ?
>
>/usr/local/atlas

Thanks !
Then is it possible to execute 4 simultaneous "make build" w/ONE
"shared" source directory tree and w/4 different target directories,
i.e. something like

#! /bin/bash
# target directories are /home/local/atlas1 etc
# I assume that 4 configuration steps for each target tree were
# performed before this run
#
echo "start"
(cd /home/local/atlas1; numactl --membind=0 --cpunodebind=0 make build
2>&1 > makebuild_1.log &)
(cd /home/local/atlas2; numactl --membind=0 --cpunodebind=0 make build
2>&1 > makebuild_1.log &)
(cd /home/local/atlas3; numactl --membind=0 --cpunodebind=0 make build
2>&1 > makebuild_1.log &)
(cd /home/local/atlas4; numactl --membind=0 --cpunodebind=0 make build
2>&1 > makebuild_1.log &)
echo "finish"
 
- or I'll need also to have *4* source dir trees ?

Yours
Mikhail

>**************************************************************************
>** R. Clint Whaley, PhD ** Assist Prof, UTSA **
>www.cs.utsa.edu/~whaley **
>**************************************************************************
>
>-------------------------------------------------------------------------
>This SF.Net email is sponsored by the Moblin Your Move Developer's
>challenge
>Build the coolest Linux based applications with Moblin SDK & win great
>prizes
>Grand prize is a trip for two to an Open Source event anywhere in the
>world
>http://moblin-contest.org/redirect.php?banner_id=100&url=/
>_______________________________________________
>Math-atlas-devel mailing list
>Math-atlas-devel@...
>https://lists.sourceforge.net/lists/listinfo/math-atlas-devel


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by dean gaudet-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 16 Jul 2008, Clint Whaley wrote:

> Guys,
>
> >Which Linux distro do you use ? You should kill all the
> >"power-sensual" daemons (like powersaved in SuSE) and remove the
> >corresponding kernel daemons like cpufreq.
>
> I tried two linux distros: kubuntu hardy heron & Fedora Core 9.  FC9 does
> it a *lot* less than kubuntu, but it still does it.  I have turned off
> "cool & quiet" in the bios, and cpuinfo shows full speed even as my
> timings drop by half.  I verified that cpufreq doesn't work after the BIOS
> turnoff (the scaling directiries are missing from ACPI).
>
> I more & more suspect the problem is in the motherboard.  Dean (I think it was)
> mentioned that thermal throttling is broken in the Phenom; I wonder if
> the mobo assumes it works and does some voltage things it can't handle
> in response to OS calls.

hmm that could be possible ... a few choices for figuring this out --

for reference, fam10h BKDG:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.PDF

and fam10h revision guide:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.PDF

try this:

setpci -d 1022:1204 64.l

that should print out the "F3x64 Hardware Thermal Control (HTC) Register"
... if bit 0 is non-zero then HTC is enabled.  try disabling it like so:

setpci -d 1022:1204 64.l=0


> > Whatever it is, it is affected by OS, so it is not pure hardware.  But,
> I wonder if the OS sends some signal that the mobo should ignore,
> but instead attempts something the k10h can't do . . .
>
> Anyway, if anyone can tell me OS & mobo combinations that they have seen
> work for the Phenom, I'd appreciate it.

it's been a while since i've built atlas -- but i'll give it a spin on my
phenom and report back.  3.9.0 is good enough?


> >BIOS patch was declared as leading to some performance degradation.
>
> Yeah, but I did not expect that massive die-off for a cache-dominated
> algorithm like GEMM.  For HPC, the slowdown is massive and pervasive,
> but the TLB bug is triggered daily.

are you sure it's the TLB bug?  in lots of testing i've never tripped the
erratum 298 problem.

if you want to experiment with the workarounds, build
http://code.google.com/p/iotools/ and put it into your PATH.

then execute a script something like this:

for cpu in `awk '/^processor/ {print $3}' /proc/cpuinfo`; do
        # disable erratum 298 workaround
        wrmsr $cpu 0xc0010015 $(and $(rdmsr $cpu 0xc0010015) $(not $(shl 1 3)))
        wrmsr $cpu 0xc0011023 $(and $(rdmsr $cpu 0xc0011023) $(not $(shl 1 1)))

        # disable erratum 309 workaround
        wrmsr $cpu 0xc0011023 $(and $(rdmsr $cpu 0xc0011023) $(not $(shl 1 23)))
done

you can get more info on both workarounds from the revision guide above.

-dean

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by dean gaudet-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

on a phenom:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9600 Quad-Core Processor
stepping        : 2
cpu MHz         : 2306.997

in a M3A32-MVP DELUXE mobo ... whose bios info i can describe only as:

        Vendor: American Megatrends Inc.
        Version: 0801

(based on dmidecode)

it's running ubuntu feisty server (and powernow/etc aren't loaded)

i get the following results.

-dean


*******************************************************************************
*******************************************************************************
*******************************************************************************
*       BEGAN ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 09:50     *
*******************************************************************************
*******************************************************************************
*******************************************************************************





IN STAGE 1 INSTALL:  SYSTEM PROBE/AUX COMPILE
   Level 1 cache size calculated as 64KB.

   dFPU: Separate multiply and add instructions with 4 cycle pipeline.
         Apparent number of registers : 13
         Register-register performance=4511.70MFLOPS
   sFPU: Separate multiply and add instructions with 4 cycle pipeline.
         Apparent number of registers : 13
         Register-register performance=4511.70MFLOPS


IN STAGE 2 INSTALL:  TYPE-DEPENDENT TUNING


STAGE 2-1: TUNING PREC='d' (precision 1 of 4)


   STAGE 2-1-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_dmm8x1x120_L1pf.c, NB=40, written by R. Clint Whaley
      Performance: 8057.51MFLOPS (349.42 percent of of detected clock rate)
        (Gen case got 3928.97MFLOPS)
      mmNN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3839.81 (47.66 of copy matmul, 166.51 of clock)
      mmNT   : ma=0, lat=6, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3291.98 (40.86 of copy matmul, 142.76 of clock)
      mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3799.57 (47.16 of copy matmul, 164.77 of clock)
      mmTT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
               Performance = 3296.25 (40.91 of copy matmul, 142.94 of clock)



   STAGE 2-1-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes


   STAGE 2-1-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-1-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-1-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-1-5: GEMV TUNE
      gemvN : chose routine 3:ATL_gemvN_1x1_1a.c written by R. Clint Whaley
              Yunroll=32, Xunroll=1, using 100 percent of L1
              Performance = 1394.39 (17.31 of copy matmul, 60.47 of clock)
      gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
              Yunroll=2, Xunroll=16, using 100 percent of L1
              Performance = 1374.56 (17.06 of copy matmul, 59.61 of clock)


   STAGE 2-1-6: GER TUNE
      ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  0.51 percent of L1 Cache
              Performance = 809.66 (10.05 of copy matmul, 35.11 of clock)


STAGE 2-2: TUNING PREC='s' (precision 2 of 4)


   STAGE 2-2-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
      Performance: 15012.39MFLOPS (651.01 percent of of detected clock rate)
        (Gen case got 4435.46MFLOPS)
      mmNN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3834.01 (25.54 of copy matmul, 166.26 of clock)
      mmNT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3370.67 (22.45 of copy matmul, 146.17 of clock)
      mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3959.61 (26.38 of copy matmul, 171.71 of clock)
      mmTT   : ma=0, lat=3, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3486.42 (23.22 of copy matmul, 151.19 of clock)



   STAGE 2-2-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes


   STAGE 2-2-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-2-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-2-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-2-5: GEMV TUNE
      gemvN : chose routine 9:ATL_gemvN_32x4_1.c written by R. Clint Whaley
              Yunroll=32, Xunroll=4, using 100 percent of L1
              Performance = 1761.79 (11.74 of copy matmul, 76.40 of clock)
      gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
              Yunroll=2, Xunroll=16, using 100 percent of L1
              Performance = 1984.77 (13.22 of copy matmul, 86.07 of clock)


   STAGE 2-2-6: GER TUNE
      ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  1.00 percent of L1 Cache
              Performance = 1323.34 ( 8.81 of copy matmul, 57.39 of clock)


STAGE 2-3: TUNING PREC='z' (precision 3 of 4)


   STAGE 2-3-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_dmm14x1x56_sse2pABC.c, NB=56, written by R. Clint Whaley
      Performance: 7856.61MFLOPS (340.70 percent of of detected clock rate)
        (Gen case got 4166.97MFLOPS)
      mmNN   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3946.90 (50.24 of copy matmul, 171.16 of clock)
      mmNT   : ma=0, lat=8, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3589.62 (45.69 of copy matmul, 155.66 of clock)
      mmTN   : ma=0, lat=2, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3959.21 (50.39 of copy matmul, 171.69 of clock)
      mmTT   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
               Performance = 3599.80 (45.82 of copy matmul, 156.11 of clock)



   STAGE 2-3-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes
      zdNKB set to 0 bytes


   STAGE 2-3-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-3-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-3-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-3-5: GEMV TUNE
      gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
              Yunroll=32, Xunroll=1, using 99 percent of L1
              Performance = 2835.03 (36.08 of copy matmul, 122.94 of clock)
      gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
              Yunroll=2, Xunroll=8, using 99 percent of L1
              Performance = 2116.02 (26.93 of copy matmul, 91.76 of clock)


   STAGE 2-3-6: GER TUNE
      ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  0.76 percent of L1 Cache
              Performance = 1609.07 (20.48 of copy matmul, 69.78 of clock)


STAGE 2-4: TUNING PREC='c' (precision 4 of 4)


   STAGE 2-4-1 : BUILDING BLOCK MATMUL TUNE
      The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
      Performance: 14625.27MFLOPS (634.23 percent of of detected clock rate)
        (Gen case got 4415.67MFLOPS)
      mmNN   : ma=0, lat=8, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3934.66 (26.90 of copy matmul, 170.63 of clock)
      mmNT   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3615.18 (24.72 of copy matmul, 156.77 of clock)
      mmTN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3953.05 (27.03 of copy matmul, 171.42 of clock)
      mmTT   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
               Performance = 3678.05 (25.15 of copy matmul, 159.50 of clock)



   STAGE 2-4-2: CacheEdge DETECTION
      CacheEdge set to 3145728 bytes
      csNKB set to 0 bytes


   STAGE 2-4-3: LARGE/SMALL CASE CROSSOVER DETECTION


   STAGE 2-4-3: COPY/NO-COPY CROSSOVER DETECTION
      done.


   STAGE 2-4-4: LEVEL 3 BLAS TUNE
      done.


   STAGE 2-4-5: GEMV TUNE
      gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
              Yunroll=32, Xunroll=1, using 86 percent of L1
              Performance = 5542.02 (37.89 of copy matmul, 240.33 of clock)
      gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
              Yunroll=2, Xunroll=8, using 86 percent of L1
              Performance = 2548.30 (17.42 of copy matmul, 110.51 of clock)


   STAGE 2-4-6: GER TUNE
      ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
            mu=16, nu=1, using  0.75 percent of L1 Cache
              Performance = 3173.71 (21.70 of copy matmul, 137.63 of clock)


STAGE 3: GENERAL LIBRARY BUILD


STAGE 4: POST-BUILD TUNING
   done.


STAGE 4-2: Threading install
   done.

*******************************************************************************
*******************************************************************************
*******************************************************************************
*      FINISHED ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 10:02   *
*******************************************************************************
*******************************************************************************
*******************************************************************************




-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by dean gaudet-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

btw i should add:

# for cpu in `awk '/^processor/ {print $3}' /proc/cpuinfo`; do (echo $cpu;
rdmsr $cpu 0xc0010015; rdmsr $cpu 0xc0011023) | fmt -w1000; done
0 0x0000000001000010 0x0000000000200020
1 0x0000000001000010 0x0000000000200020
2 0x0000000001000010 0x0000000000200020
3 0x0000000001000010 0x0000000000200020

so neither errata 298 nor 309 were enabled...  and this is a B2 part...

# setpci -d 1022:1204 64.l
00000000

nor was HTC.

-dean


On Sat, 19 Jul 2008, dean gaudet wrote:

> on a phenom:
>
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 16
> model           : 2
> model name      : AMD Phenom(tm) 9600 Quad-Core Processor
> stepping        : 2
> cpu MHz         : 2306.997
>
> in a M3A32-MVP DELUXE mobo ... whose bios info i can describe only as:
>
>         Vendor: American Megatrends Inc.
>         Version: 0801
>
> (based on dmidecode)
>
> it's running ubuntu feisty server (and powernow/etc aren't loaded)
>
> i get the following results.
>
> -dean
>
>
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
> *       BEGAN ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 09:50     *
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
>
>
>
>
>
> IN STAGE 1 INSTALL:  SYSTEM PROBE/AUX COMPILE
>    Level 1 cache size calculated as 64KB.
>
>    dFPU: Separate multiply and add instructions with 4 cycle pipeline.
>          Apparent number of registers : 13
>          Register-register performance=4511.70MFLOPS
>    sFPU: Separate multiply and add instructions with 4 cycle pipeline.
>          Apparent number of registers : 13
>          Register-register performance=4511.70MFLOPS
>
>
> IN STAGE 2 INSTALL:  TYPE-DEPENDENT TUNING
>
>
> STAGE 2-1: TUNING PREC='d' (precision 1 of 4)
>
>
>    STAGE 2-1-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_dmm8x1x120_L1pf.c, NB=40, written by R. Clint Whaley
>       Performance: 8057.51MFLOPS (349.42 percent of of detected clock rate)
>         (Gen case got 3928.97MFLOPS)
>       mmNN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3839.81 (47.66 of copy matmul, 166.51 of clock)
>       mmNT   : ma=0, lat=6, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3291.98 (40.86 of copy matmul, 142.76 of clock)
>       mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3799.57 (47.16 of copy matmul, 164.77 of clock)
>       mmTT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=0, if=12, nf=1
>                Performance = 3296.25 (40.91 of copy matmul, 142.94 of clock)
>
>
>
>    STAGE 2-1-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>
>
>    STAGE 2-1-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-1-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-1-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-1-5: GEMV TUNE
>       gemvN : chose routine 3:ATL_gemvN_1x1_1a.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=1, using 100 percent of L1
>               Performance = 1394.39 (17.31 of copy matmul, 60.47 of clock)
>       gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=16, using 100 percent of L1
>               Performance = 1374.56 (17.06 of copy matmul, 59.61 of clock)
>
>
>    STAGE 2-1-6: GER TUNE
>       ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  0.51 percent of L1 Cache
>               Performance = 809.66 (10.05 of copy matmul, 35.11 of clock)
>
>
> STAGE 2-2: TUNING PREC='s' (precision 2 of 4)
>
>
>    STAGE 2-2-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
>       Performance: 15012.39MFLOPS (651.01 percent of of detected clock rate)
>         (Gen case got 4435.46MFLOPS)
>       mmNN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3834.01 (25.54 of copy matmul, 166.26 of clock)
>       mmNT   : ma=0, lat=2, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3370.67 (22.45 of copy matmul, 146.17 of clock)
>       mmTN   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3959.61 (26.38 of copy matmul, 171.71 of clock)
>       mmTT   : ma=0, lat=3, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3486.42 (23.22 of copy matmul, 151.19 of clock)
>
>
>
>    STAGE 2-2-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>
>
>    STAGE 2-2-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-2-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-2-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-2-5: GEMV TUNE
>       gemvN : chose routine 9:ATL_gemvN_32x4_1.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=4, using 100 percent of L1
>               Performance = 1761.79 (11.74 of copy matmul, 76.40 of clock)
>       gemvT : chose routine 105:ATL_gemvT_2x16_1.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=16, using 100 percent of L1
>               Performance = 1984.77 (13.22 of copy matmul, 86.07 of clock)
>
>
>    STAGE 2-2-6: GER TUNE
>       ger : chose routine 1:ATL_ger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  1.00 percent of L1 Cache
>               Performance = 1323.34 ( 8.81 of copy matmul, 57.39 of clock)
>
>
> STAGE 2-3: TUNING PREC='z' (precision 3 of 4)
>
>
>    STAGE 2-3-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_dmm14x1x56_sse2pABC.c, NB=56, written by R. Clint Whaley
>       Performance: 7856.61MFLOPS (340.70 percent of of detected clock rate)
>         (Gen case got 4166.97MFLOPS)
>       mmNN   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3946.90 (50.24 of copy matmul, 171.16 of clock)
>       mmNT   : ma=0, lat=8, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3589.62 (45.69 of copy matmul, 155.66 of clock)
>       mmTN   : ma=0, lat=2, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3959.21 (50.39 of copy matmul, 171.69 of clock)
>       mmTT   : ma=0, lat=4, nb=40, mu=8, nu=1 ku=40, ff=0, if=9, nf=1
>                Performance = 3599.80 (45.82 of copy matmul, 156.11 of clock)
>
>
>
>    STAGE 2-3-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>       zdNKB set to 0 bytes
>
>
>    STAGE 2-3-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-3-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-3-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-3-5: GEMV TUNE
>       gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=1, using 99 percent of L1
>               Performance = 2835.03 (36.08 of copy matmul, 122.94 of clock)
>       gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=8, using 99 percent of L1
>               Performance = 2116.02 (26.93 of copy matmul, 91.76 of clock)
>
>
>    STAGE 2-3-6: GER TUNE
>       ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  0.76 percent of L1 Cache
>               Performance = 1609.07 (20.48 of copy matmul, 69.78 of clock)
>
>
> STAGE 2-4: TUNING PREC='c' (precision 4 of 4)
>
>
>    STAGE 2-4-1 : BUILDING BLOCK MATMUL TUNE
>       The best matmul kernel was ATL_smm6x1x120_sse.c, NB=120, written by R. Clint Whaley
>       Performance: 14625.27MFLOPS (634.23 percent of of detected clock rate)
>         (Gen case got 4415.67MFLOPS)
>       mmNN   : ma=0, lat=8, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3934.66 (26.90 of copy matmul, 170.63 of clock)
>       mmNT   : ma=0, lat=4, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3615.18 (24.72 of copy matmul, 156.77 of clock)
>       mmTN   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3953.05 (27.03 of copy matmul, 171.42 of clock)
>       mmTT   : ma=0, lat=5, nb=40, mu=12, nu=1 ku=40, ff=1, if=13, nf=1
>                Performance = 3678.05 (25.15 of copy matmul, 159.50 of clock)
>
>
>
>    STAGE 2-4-2: CacheEdge DETECTION
>       CacheEdge set to 3145728 bytes
>       csNKB set to 0 bytes
>
>
>    STAGE 2-4-3: LARGE/SMALL CASE CROSSOVER DETECTION
>
>
>    STAGE 2-4-3: COPY/NO-COPY CROSSOVER DETECTION
>       done.
>
>
>    STAGE 2-4-4: LEVEL 3 BLAS TUNE
>       done.
>
>
>    STAGE 2-4-5: GEMV TUNE
>       gemvN : chose routine 3:ATL_cgemvN_1x1_1a.c written by R. Clint Whaley
>               Yunroll=32, Xunroll=1, using 86 percent of L1
>               Performance = 5542.02 (37.89 of copy matmul, 240.33 of clock)
>       gemvT : chose routine 102:ATL_cgemvT_2x2_0.c written by R. Clint Whaley
>               Yunroll=2, Xunroll=8, using 86 percent of L1
>               Performance = 2548.30 (17.42 of copy matmul, 110.51 of clock)
>
>
>    STAGE 2-4-6: GER TUNE
>       ger : chose routine 1:ATL_cger1_axpy.c written by R. Clint Whaley
>             mu=16, nu=1, using  0.75 percent of L1 Cache
>               Performance = 3173.71 (21.70 of copy matmul, 137.63 of clock)
>
>
> STAGE 3: GENERAL LIBRARY BUILD
>
>
> STAGE 4: POST-BUILD TUNING
>    done.
>
>
> STAGE 4-2: Threading install
>    done.
>
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
> *      FINISHED ATLAS3.9.0  INSTALL OF SECTION 0-0-0 ON 07/19/2008 AT 10:02   *
> *******************************************************************************
> *******************************************************************************
> *******************************************************************************
>
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Math-atlas-devel mailing list
> Math-atlas-devel@...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dean (& guys),

OK, here are a few things.  First, there is a modified xdfc available at:
   www.cs.utsa.edu/~whaley/dload/xdfc
It is my normal kernel timer, which has been modified to keep calling the K10h
kernel 1K times.  When I run this on my Phenom, most of the numbers are roughly
8Gflop, but then it drops to 4Gflop for a lot of them.  Can anyone with a
Phenom run this executable and make sure yours doesn't do this to you too
(i.e. the perf drop happens rarely enough that it can be missed)?

If this executable won't work for you (eg., different libraries) you can
make create it yourself by changing line 634 of ATLAS/tune/blas/gemm/fc.c from:
   #define NSAMPLE 3
to:
   #define NSAMPLE 1024

And then issuing (in $BLDdir/tune/blas/gemm):
   make ummcase pre=d DMCFLAGS="-x assembler-with-cpp" \
        mmrout=CASES/ATL_dmm8x1x120_L1pf.c nb=40

>try this:
>setpci -d 1022:1204 64.l

bit 0 was zero for me :(

>setpci -d 1022:1204 64.l=0

did this (despite above), and ./xdfc still behaves same way

>> Yeah, but I did not expect that massive die-off for a cache-dominated
>> algorithm like GEMM.  For HPC, the slowdown is massive and pervasive,
>> but the TLB bug is triggered daily.
>
>are you sure it's the TLB bug?  in lots of testing i've never tripped the
>erratum 298 problem.

No, but we were debugging a lot of large parallel DGEMMs, and the machine was
dying roughly once a day.  I applied the patch, and the machine has been
stable since, so I just assumed.  However, it could have been something
we are doing differently in our testing (as the code has changes), or an
unrelated other thing in the BIOS . . .

if you want to experiment with the workarounds, build
http://code.google.com/p/iotools/ and put it into your PATH.

>if you want to experiment with the workarounds, build
>http://code.google.com/p/iotools/ and put it into your PATH.
>
>then execute a script something like this:
>
>for cpu in `awk '/^processor/ {print $3}' /proc/cpuinfo`; do
>        # disable erratum 298 workaround
>        wrmsr $cpu 0xc0010015 $(and $(rdmsr $cpu 0xc0010015) $(not $(shl 1 3)))
>        wrmsr $cpu 0xc0011023 $(and $(rdmsr $cpu 0xc0011023) $(not $(shl 1 1)))
>
>        # disable erratum 309 workaround
>        wrmsr $cpu 0xc0011023 $(and $(rdmsr $cpu 0xc0011023) $(not $(shl 1 23)))
>done

This program allows you to change stuff in the BIOS on the fly?  Or is this
linux workarounds I need to be able to apply with an unpatched BIOS?  I
guess I need to check it out with savana & compile it (I didn't see any
simple download link)?

Thanks,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: k10h post-BIOS patch effects

by dean gaudet-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 20 Jul 2008, Clint Whaley wrote:

> Dean (& guys),
>
> OK, here are a few things.  First, there is a modified xdfc available at:
>    www.cs.utsa.edu/~whaley/dload/xdfc
> It is my normal kernel timer, which has been modified to keep calling the K10h
> kernel 1K times.  When I run this on my Phenom, most of the numbers are roughly
> 8Gflop, but then it drops to 4Gflop for a lot of them.  Can anyone with a
> Phenom run this executable and make sure yours doesn't do this to you too
> (i.e. the perf drop happens rarely enough that it can be missed)?

it seems to be 8.4gflop for my 2.3ghz pheonm:

# ./xdfc
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8366.61
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.58
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.81
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8376.21
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.64
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.92
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.55
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.40
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8377.10
dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.314, mflop=8376.29
...

i've let it run for a couple minutes now, no changes... actually it
finished, still no significant changes in the mflop -- it climbed a tiny
amount:

dNB=40, ld=40,40,40, mu=4, nu=4, ku=1, lat=4, pf=0: time=0.313, mflop=8391.98
dNB=40, time=0.313, mflop=8394.20


> >if you want to experiment with the workarounds, build
> >http://code.google.com/p/iotools/ and put it into your PATH.
> >
> >then execute a script something like this:
> >
> >for cpu in `awk '/^processor/ {print $3}' /proc/cpuinfo`; do
> >        # disable erratum 298 workaround
> >        wrmsr $cpu 0xc0010015 $(and $(rdmsr $cpu 0xc0010015) $(not $(shl 1 3)))
> >        wrmsr $cpu 0xc0011023 $(and $(rdmsr $cpu 0xc0011023) $(not $(shl 1 1)))
> >
> >        # disable erratum 309 workaround
> >        wrmsr $cpu 0xc0011023 $(and $(rdmsr $cpu 0xc0011023) $(not $(shl 1 23)))
> >done
>
> This program allows you to change stuff in the BIOS on the fly?  Or is this
> linux workarounds I need to be able to apply with an unpatched BIOS?  I
> guess I need to check it out with savana & compile it (I didn't see any
> simple download link)?

yeah you probably need to check it out with svn and build it.
you probably also need to "modprobe msr".

the BIOS workarounds for those errata amount to setting those bits (i.e.
bit 3 of MSR 0xc0010015, bit 1 of 0xc0011023 and bit 23 of 0xc0011023)
... for these specific workarounds we can tweak them dynamically.  if you
execute that script i pasted you'll disable the workarounds... which
will make your B2 behave like a B3.

-dean

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: atlas-3.9.0

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>Then is it possible to execute 4 simultaneous "make build" w/ONE
>"shared" source directory tree and w/4 different target directories,
>i.e. something like

ATLAS can certainly build any number of BLDdirs from one SRCdir.  I would
not recommend firing multiple ones off at once, though, as the load from
one install will interfere (probably strongly) with other installs' timings.
So, it is fine to use the same source tree for multiple installs, but I
suggest serializing the installs.

Cheers,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: atlas-3.9.0

by Mikhail Kuzminsky :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message from Clint Whaley <whaley@...> (Mon, 21 Jul 2008
11:12:44 -0500):

>>Then is it possible to execute 4 simultaneous "make build" w/ONE
>>"shared" source directory tree and w/4 different target directories,
>>i.e. something like
>
>ATLAS can certainly build any number of BLDdirs from one SRCdir.  I
>would
>not recommend firing multiple ones off at once, though, as the load
>from
>one install will interfere (probably strongly) with other installs'
>timings.
>So, it is fine to use the same source tree for multiple installs, but
>I
>suggest serializing the installs.

Eh, the idea is just to see what will be at simultaneous tuning :-)!
I.e. which CacheEdge value will be obtained if you'll run 4
simultaneous building on 4-cores CPU ?

Mikhail


>
>Cheers,
>Clint
>
>**************************************************************************
>** R. Clint Whaley, PhD ** Assist Prof, UTSA **
>www.cs.utsa.edu/~whaley **
>**************************************************************************
>
>-------------------------------------------------------------------------
>This SF.Net email is sponsored by the Moblin Your Move Developer's
>challenge
>Build the coolest Linux based applications with Moblin SDK & win
>great prizes
>Grand prize is a trip for two to an Open Source event anywhere in the
>world
>http://moblin-contest.org/redirect.php?banner_id=100&url=/
>_______________________________________________
>Math-atlas-devel mailing list
>Math-atlas-devel@...
>https://lists.sourceforge.net/lists/listinfo/math-atlas-devel


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel