mv performance
I'm a new ATLAS user and tried version 3.8.2. I couldn't get the reference level of performance for double precision matrix-vector multiply (no transpose), the part of most interest to me. Should I be surprised? What should I be doing to improve it?
Here's the "make time" output:
Reference clock rate=1597Mhz, new rate=2390Mhz
Refrenc : % of clock rate achieved by reference install
Present : % of clock rate achieved by present ATLAS install
single precision double precision
******************************** *******************************
real complex real complex
--------------- --------------- --------------- ---------------
Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
========= ======= ======= ======= ======= ======= ======= ======= =======
kSelMM 346.9 369.1 337.9 368.7 181.7 184.5 180.6 177.0
kGenMM 167.6 177.2 179.0 158.0 159.0 152.7 153.0 158.0
kMM_NT 126.4 126.6 137.7 134.2 105.8 100.7 116.7 119.7
kMM_TN 151.2 134.2 156.7 142.9 124.1 116.6 125.4 134.2
BIG_MM 325.3 336.0 319.8 333.6 171.1 177.6 168.3 174.0
kMV_N 50.5 45.2 96.7 93.4 48.2 38.3 91.1 55.9
kMV_T 54.3 56.0 63.0 62.1 32.0 30.3 49.8 43.3
kGER 39.3 44.5 69.9 71.3 20.9 22.4 44.3 39.7
The computer has (from /proc/cpuinfo): 2 cpus, each
AMD Opteron Processor 250, 2390 MHz, cache 1024 KB
It runs Red Hat Enterprise Linux 3. There are multiple users.
The standard gcc on this system is version 3.2.3 but I downloaded version 4.2.4 and used this configure line:
../configure --prefix=$HOME/ATLAS/atlas.a4 -Fa alg -fPIC -Ss kern $HOME/local/gcc-4.2.4/bin/cc
I installed the new version of Make.mvtune.
The automatic compuation of CacheEdge doesn't work consistently in this multi-user environment, so
I set it to 768KB. According to my experiments running xfindCE, the value doesn't matter much on one processor.
If you've gotten this far, thank you for your attention!
- Jeff Painter