WARNING: This server is unstable and will be retired in the next days.
If you want to keep this forum available, please request immediately a migration
on the Nabble Support forum.
Forums that don't receive any migration request will be deleted forever.
I have released 3.9.65. It fixes a bug in the threaded L2BLAS which was
caused by overflowing an integer indexing very large parallel
arrays. I made changes to fix this in the L2 code specifically, but it
is a problem that can hit everywhere, and rewrite of pretty much everything
to use size_t is the only real fix.
Other than this, the big news is significant improvement of ARM single
(real & complex) performance, where we improve from about 1.1 flops/cycle
to around 1.3 flops/cycle for out-of-cache s/cGEMM (peak is 2).
I believe an install from archdefs can be slightly faster, as I think I've
improved things not to rerun the atlas_zd/csNKB.h probes if your archdefs
already include them (only some do).
ATLAS 3.9.65 released 02/07/12, changes from 3.9.64:
* Improved single-precision ARM GEMM kernel.
* Improved s/c ARM archdefaults
* Fixed L2 threaded bugs by casting ldamul to size_t