3.9.15

View: New views
1 Messages — Rating Filter:   Alert me  

3.9.15

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Guys,

I've been quiet recently; I have been overwhelmed at work.  However, I have
been working every spare moment on ATLAS.  The task I've been working
on is changing the ATLAS matmul search to incorporate a new code generator!

Chad Zalkin (student at UTSA working with me on his MS) created a SSE-enabled
code generator for ATLAS this summer, and I have just finished getting
ATLAS's framework to utilize it.  The code generator uses gcc/icc intrinsics
to vectorize the main GEMM kernel.  The SSE generator's main purpose is to
ease ATLAS's reliance on hand-tuning for vectorized kernels.  On some
machines, it provides speedup over existing hand-tuned kernels (eg., my
Core2 system gets about 8% speedup for single precision).  I haven't
tracked it down yet, but the code generator seems to never provide speedup
on the AMD systems I have access to, but does seem to help Intel systems.
I'm guessing gcc is generating an instruction stream that intel likes but
that is not OK on AMDs, but it'll have to be looked into . . .

Chad is still working on the code generator: right now it does not work
for single precision complex; I tend not to work very hard on hand-tuning
single precision, so performance should probably go up even further when
this is fixed.

I have not provided architectural defaults for the new search, so the
install can be quite long in 3.9.15.  However, I thought people would
be interested to see the new code generator; if you want a faster install,
just continue using 3.9.14 for now.

I have also started the process of rationalizing ATLAS's search.  ATLAS
is now built so that others can easily plug in their own searches and/or
code generators into the ATLAS framework.  I still need to produce some
documentation explaining how to do this, but you can find most things
you need in ATLAS/include/atlas_mm[parse,testtime].h.  The other nice
thing that I think people will like is that I have quietened down the
GEMM search.  All the compilation and so forth goes to /dev/null, so
that it is easier to see the timing results as they are searched . . .

Cheers,
Clint

ATLAS 3.9.15 released 10/10/09, changes from 3.9.14
   * Addition of Chad Zalkin's SSE GEMM generator to ATLAS
   * Support for external searches and use of standard matmul search routs in:
     - include/atlas_mmparse.h
     - include/atlas_mmtesttime.h
   * Numerous search changes to incorporate above in ATLAS matmul install
     - Changed matmul install to be much quieter

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel