|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
3.9.12Guys,
I have finally released 3.9.12. It took so long because I completely rewrote the GER tuning framework, and I just now got everything working about as well as before. Some platforms may see a moderate speedup in GER/SYR/SYR2/HER/HER2 performance, I'm not sure (right now, the new framework mainly lays the groundwork for future speedups). One thing that is different is that ATLAS now tunes GER for 3 contexts: out-of-cache, in-L1-cache, and in-L2-cache. Right now, cblas_<pre>ger will always get you the out-of-cache version, but you can directly call the ATLAS internal routine ATL_<pre>ger_L[2,1] if you know your A matrix is cache-contained. On some platforms, this an get you good speedups, though again, I'll need further GER development to exploit this tuning difference fully. The other big news is that ATLAS can now autobuild LAPACK 3.2 (3.1 also). I got rid of most of the lapack building options in favor of --with-netlib-lapack-tarfile=<tarfile> ATLAS uses a small sed script to autoadapt LAPACK's make.inc.ex to use the ATLAS-compatible flags, and then uses LAPACK's makefiles to build the library for maximal portability across LAPACK versions. I have not tried doing LAPACK testing with this new lapack autobuild, for either 3.1 or 3.2. It also looks like building shared libs is currently broken, because ATLAS puts some objects that are never called into the library (eg., OpenMP calls); this doesn't cause any problems for .a's, but you can get missing symbols when you try to use the .so's built by --shared . . . However, it looks like this was broken in 3.9.11 as well . . . I have mostly updated the ATLAS/doc/atlas_install.pdf with the new stuff. ChangeLog below. There are also several important bug fixes, see the changelog excerpt below. I think threaded QR should get a decent speedup due to one of the bug fixes. Cheers, Clint ATLAS 3.9.12 released 08/06/09, changes from 3.9.11 * Complete rewrite of GER, SYR/HER and SYR2/HER2: - New tuning mechinism tunes GER for in-L1, in-L2, and out-of-cache * Call ATL_<pre>ger_L1 if data known to be in L1 cache * Call ATL_<pre>ger_L2 if data known to be in L2 cache - Most architectures now lack GER arch defs * Provided GER archdefs 64-bit K10h and Core2 - atlas_devel not yet updated * Relatively untested standard timing/tester code available for all tuned kernels (GER fairly well tested) - atlas_[mv,r1,mm]parse.h reads standard input/output files - atlas_[mv,r1,mm]testtime.h provides tester/timer calls for kernels * Can compile both lapack 3.2 and 3.1 with --with-netlib-lapack-tarfile - Removed support for other ways of building lapack - atlas_install mostly updated * Bug fixes - Fixed BETA=0 SCAL NaN-propogation bug (no more call to ATL_set) - Fixed C/Z GEMM JITcp bug where C was read when BETA=0 - Fixed threaded LAPACK calling serial ilaenv (QR speedup) ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
| Free embeddable forum powered by Nabble | Forum Help |