« Return to Thread: tuning of Atlas on x86/NUMA

tuning of Atlas on x86/NUMA

by Mikhail Kuzminsky :: Rate this Message:

Reply to Author | View in Thread

How the atlas tuning process (for example, for dgemm kernel) is
organized for the case
of SMP/NUMA servers w/CPUs having shared cache ? For example, for
dual-socket quad-core Opteron server ?

If dgemm tuning takes into account shared cache size, and is tuned
only "single threaded" (sequential run),
then it'll propose that it can use whole cache (for example, 2 MB L3
for Opteron 2350). But for multithreaded dgemm w/4 threads per CPU
only 512K of L3 will be available w/o a lot of cache miss. Therefore
multithreaded version requires, IMHO, "independed" (from sequential
version) tuning.

And the second question is about using of process affinity (taskset
for Linux) and NUMA-allocation of memory
(using of numactl) at the tuning process. Does it takes into account
this possibilities or there is no serious reasons
to use taskset/numactl in tuning ?

Mikhail Kuzminsky
Computer Assistance to Chemical Research Center
Zelinsky Institute of Organic Chemistry
Moscow    

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

 « Return to Thread: tuning of Atlas on x86/NUMA