3.9.12

View: New views
1 Messages — Rating Filter:   Alert me  

3.9.12

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Guys,

I have finally released 3.9.12.  It took so long because I completely
rewrote the GER tuning framework, and I just now got everything working
about as well as before.  Some platforms may see a moderate speedup in
GER/SYR/SYR2/HER/HER2 performance, I'm not sure (right now, the new
framework mainly lays the groundwork for future speedups).

One thing that is different is that ATLAS now tunes GER for 3 contexts:
out-of-cache, in-L1-cache, and in-L2-cache.  Right now, cblas_<pre>ger will
always get you the out-of-cache version, but you can directly call the
ATLAS internal routine ATL_<pre>ger_L[2,1] if you know your A matrix
is cache-contained.  On some platforms, this an get you good speedups,
though again, I'll need further GER development to exploit this
tuning difference fully.

The other big news is that ATLAS can now autobuild LAPACK 3.2 (3.1 also).
I got rid of most of the lapack building options in favor of
   --with-netlib-lapack-tarfile=<tarfile>

ATLAS uses a small sed script to autoadapt LAPACK's make.inc.ex to
use the ATLAS-compatible flags, and then uses LAPACK's makefiles to
build the library for maximal portability across LAPACK versions.

I have not tried doing LAPACK testing with this new lapack autobuild, for
either 3.1 or 3.2.

It also looks like building shared libs is currently broken, because ATLAS
puts some objects that are never called into the library (eg., OpenMP calls);
this doesn't cause any problems for .a's, but you can get missing symbols
when you try to use the .so's built by --shared . . .
However, it looks like this was broken in 3.9.11 as well . . .

I have mostly updated the ATLAS/doc/atlas_install.pdf with the new stuff.
ChangeLog below.

There are also several important bug fixes, see the changelog excerpt below.
I think threaded QR should get a decent speedup due to one of the bug fixes.

Cheers,
Clint

ATLAS 3.9.12 released 08/06/09, changes from 3.9.11
   * Complete rewrite of GER, SYR/HER and SYR2/HER2:
     - New tuning mechinism tunes GER for in-L1, in-L2, and out-of-cache
       * Call ATL_<pre>ger_L1 if data known to be in L1 cache
       * Call ATL_<pre>ger_L2 if data known to be in L2 cache
     - Most architectures now lack GER arch defs
       * Provided GER archdefs 64-bit K10h and Core2
     - atlas_devel not yet updated
   * Relatively untested standard timing/tester code available for all
     tuned kernels (GER fairly well tested)
     - atlas_[mv,r1,mm]parse.h reads standard input/output files
     - atlas_[mv,r1,mm]testtime.h provides tester/timer calls for kernels
   * Can compile both lapack 3.2 and 3.1 with --with-netlib-lapack-tarfile
     - Removed support for other ways of building lapack
     - atlas_install mostly updated
   * Bug fixes
     - Fixed BETA=0 SCAL NaN-propogation bug (no more call to ATL_set)
     - Fixed C/Z GEMM JITcp bug where C was read when BETA=0
     - Fixed threaded LAPACK calling serial ilaenv  (QR speedup)

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel