Guys,
I have released 3.9.65. It fixes a bug in the threaded L2BLAS which was
caused by overflowing an integer indexing very large parallel
arrays. I made changes to fix this in the L2 code specifically, but it
is a problem that can hit everywhere, and rewrite of pretty much everything
to use size_t is the only real fix.
Other than this, the big news is significant improvement of ARM single
(real & complex) performance, where we improve from about 1.1 flops/cycle
to around 1.3 flops/cycle for out-of-cache s/cGEMM (peak is 2).
I believe an install from archdefs can be slightly faster, as I think I've
improved things not to rerun the atlas_zd/csNKB.h probes if your archdefs
already include them (only some do).
Cheers,
Clint
ATLAS 3.9.65 released 02/07/12, changes from 3.9.64:
* Improved single-precision ARM GEMM kernel.
* Improved s/c ARM archdefaults
* Fixed L2 threaded bugs by casting ldamul to size_t
**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel