« Return to Thread: 3.9.1

Re: dgemm performance dependance from CacheEdge value

by Clint Whaley-2 :: Rate this Message:

Reply to Author | View in Thread

>gemm xdfindCE -f res/atlas_cacheedg
>e.h
>TA  TB       M       N       K   alpha    beta  CacheEdge       TIME
>   MFLOPS
>==  ==  ======  ======  ======  ======  ======  =========  =========
> ========
>
>  T   N    1600    1600    1600    1.00    1.00          0      1.228
>  6671.69
>  T   N    1600    1600    1600    1.00    1.00         16     -2.000
>     0.00
>  T   N    1600    1600    1600    1.00    1.00         32     -2.000
>     0.00
>  T   N    1600    1600    1600    1.00    1.00         64      1.514
>  5409.95
>  T   N    1600    1600    1600    1.00    1.00        128      1.279
>  6405.71
>  T   N    1600    1600    1600    1.00    1.00        256      1.248
>  6565.56
>  T   N    1600    1600    1600    1.00    1.00        512      1.239
>  6610.67
>  T   N    1600    1600    1600    1.00    1.00       1024      1.227
>  6673.78
>  T   N    1600    1600    1600    1.00    1.00       2048      1.227
>  6673.98
>  T   N    1600    1600    1600    1.00    1.00       4096      1.227
>  6674.22
>  T   N    1600    1600    1600    1.00    1.00       8192      1.228
>  6672.71
>
>Initial CE=4096KB, mflop=6674.22
>
>  T   N    1600    1600    1600    1.00    1.00       3072      1.227
>  6674.70
>  T   N    1600    1600    1600    1.00    1.00       2560      1.227
>  6676.11
>  T   N    1600    1600    1600    1.00    1.00       2304      1.227
>  6674.23
>  T   N    1600    1600    1600    1.00    1.00       2816      1.227
>  6675.75
>
>Best CE=2560KB, mflop=6676.11
>====================================================================

If you look at this output, you see that the performance of CE=2M is absolutely
indistinguishable from CE=0 (no L2 blocking).  In such a case, ATLAS uses
CE, since its partitioning of K reduces workspace needs of large problems.
So, what you are seeing is that this system doesn't get any benefit from
L2 cache blocking, but that we can afford multiple write of C for large
matrices . . .

Cheers,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

 « Return to Thread: 3.9.1