Mikhail,
ATLAS presently does almost not SMP-related empirical tuning. We are
presently looking at it, but the present package does none. Here's a link
you may be interested in from the errata:
http://math-atlas.sourceforge.net/errata.html#SMPCE
>How the atlas tuning process (for example, for dgemm kernel) is
>organized for the case
>of SMP/NUMA servers w/CPUs having shared cache ? For example, for
>dual-socket quad-core Opteron server ?
>
>If dgemm tuning takes into account shared cache size, and is tuned
>>only "single threaded" (sequential run),
>then it'll propose that it can use whole cache (for example, 2 MB L3
>for Opteron 2350). But for multithreaded dgemm w/4 threads per CPU
>only 512K of L3 will be available w/o a lot of cache miss. Therefore
>multithreaded version requires, IMHO, "independed" (from sequential
>version) tuning.
>And the second question is about using of process affinity (taskset
>for Linux) and NUMA-allocation of memory
>(using of numactl) at the tuning process. Does it takes into account
>this possibilities or there is no serious reasons
>to use taskset/numactl in tuning ?
We are presently looking at the affects of using processor affinity. I have
no idea what numactl is, do you have a link?
Cheers,
Clint
**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel