|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
3.9.1Guys,
I have released 3.9.1. It is a bugfix release on 3.9.0. Cheers, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
dgemm performance dependance from CacheEdge valueI performed simultaneous installation of 4 atlas 3.9.1 examplars on
Opteron 2350. The CacheEdge value obtained was 384K and 512K (depending from build/tune thread). OK, 512K is that I want: 512= size(L3)/4. I thought that I'll see gemm kernel performance differences, at least for large matrixes test. But make time gives pracrically no difference between results for different CacheEdge values (2 MB, 512K or 384K). So there is few questions. 1) How (d)gemm performance (for large matrixes) depends from CacheEdge value ? 2) Does Atlas 3.9.x "know" that Opteron K10 has 512K L2 cache *in addition* to L3 cache ? I looked that 3.8.2 used *L2* cache size for CacheEdge value. 3) Does gemm kernels use software prefetch ? IMHO prefetch in K10 (in opposition to K8) is performed directly to L1 cache (instead of L2 cache in K8). Yours Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Moscow ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
Re: dgemm performance dependance from CacheEdge valueSorry, just found this in my queue . . .
>I performed simultaneous installation of 4 atlas 3.9.1 examplars on >Opteron 2350. >The CacheEdge value obtained was 384K and 512K (depending from >build/tune thread). Not sure what 4 simult installs is supposed to do for you. If you want to tune for parallel performance, then I recommend: http://math-atlas.sourceforge.net/errata.html#SMPCE >OK, 512K is that I want: 512= size(L3)/4. Not sure that CE is hitting the shared L3: I would guess the L2 . . . >I thought that I'll see gemm kernel performance differences, at least >for large matrixes test. But make time gives pracrically no difference >between results for different CacheEdge values (2 MB, 512K or 384K). It's been a while since I've seen more than 5% from varying CacheEdge. The last machine for which it was critical was the DEC ev5, where you had a tiny L1 and a 96K L2 that CE blocked for; I invented CE for this machine, where it gave a 20% boost in performance. It seems to me that CE gives less a boost than it used to: I put this down to ATLAS having better prefetch support these days, so that you get only modest improvements for 2-level cache blocking when the kernel is already L1-blocked with aggressive prefetch . . . >So there is few questions. > >1) How (d)gemm performance (for large matrixes) depends from CacheEdge >value ? These days, it provides a limitation on workspace, but doesn't make huge differences. It *can* improve overall application performance, particularly in parallel (though, again, affects are small). >2) Does Atlas 3.9.x "know" that Opteron K10 has 512K L2 cache *in >addition* to L3 cache ? >I looked that 3.8.2 used *L2* cache size for CacheEdge value. ATLAS only does 2 levels of explicit blocking. AFAIK,the K10h is kind of funky: I believe the caches are exclusive, so the L3 is kind of like a huge victim cache. In this case, ATLAS will almost assuredly block for the L2 with CE (since the L3 is slower). >3) Does gemm kernels use software prefetch ? IMHO prefetch in K10 (in >opposition to K8) is performed directly to L1 cache (instead of L2 >cache in K8). ATLAS's GEMM kernels heavily use prefetch. I believe the earlier AMD machines also prefetched to the L1 (I know the original athlon did). If you use 3.9.1, ATLAS has a kernel that targets the K10h (with prefetch) more effectively. I am presently working on 3.9.2, which should be out next week at the latest. In the meantime, if you want to use 3.9.1, be sure to apply the bug fixes documented at: http://sourceforge.net/tracker/index.php?func=detail&aid=2024948&group_id=23725&atid=379482 Cheers, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
Re: dgemm performance dependance from CacheEdge valueIn message from Clint Whaley <whaley@...> (Fri, 08
Aug 2008 09:12:57 -0500): >>I performed simultaneous installation of 4 atlas 3.9.1 examplars on >>Opteron 2350. >>The CacheEdge value obtained was 384K and 512K (depending from >>build/tune thread). > >Not sure what 4 simult installs is supposed to do for you. If you >want to >tune for parallel performance, then I recommend: > http://math-atlas.sourceforge.net/errata.html#SMPCE Thanks ! It's better approach than my :-) >>OK, 512K is that I want: 512= size(L3)/4. >Not sure that CE is hitting the shared L3: I would guess the L2 . . . ATLAS3.9.0: STAGE 2-1-2: CacheEdge DETECTION CacheEdge set to 2621440 bytes =================^IMHO it's L3 ^ ============ or from dMMCACHEEDGE.LOG: gemm xdfindCE -f res/atlas_cacheedg e.h TA TB M N K alpha beta CacheEdge TIME MFLOPS == == ====== ====== ====== ====== ====== ========= ========= ======== T N 1600 1600 1600 1.00 1.00 0 1.228 6671.69 T N 1600 1600 1600 1.00 1.00 16 -2.000 0.00 T N 1600 1600 1600 1.00 1.00 32 -2.000 0.00 T N 1600 1600 1600 1.00 1.00 64 1.514 5409.95 T N 1600 1600 1600 1.00 1.00 128 1.279 6405.71 T N 1600 1600 1600 1.00 1.00 256 1.248 6565.56 T N 1600 1600 1600 1.00 1.00 512 1.239 6610.67 T N 1600 1600 1600 1.00 1.00 1024 1.227 6673.78 T N 1600 1600 1600 1.00 1.00 2048 1.227 6673.98 T N 1600 1600 1600 1.00 1.00 4096 1.227 6674.22 T N 1600 1600 1600 1.00 1.00 8192 1.228 6672.71 Initial CE=4096KB, mflop=6674.22 T N 1600 1600 1600 1.00 1.00 3072 1.227 6674.70 T N 1600 1600 1600 1.00 1.00 2560 1.227 6676.11 T N 1600 1600 1600 1.00 1.00 2304 1.227 6674.23 T N 1600 1600 1600 1.00 1.00 2816 1.227 6675.75 Best CE=2560KB, mflop=6676.11 ==================================================================== This 3.9.0 data were the reason why I thought about using of L3 for CE. >>2) Does Atlas 3.9.x "know" that Opteron K10 has 512K L2 cache *in >>addition* to L3 cache ? >>I looked that 3.8.2 used *L2* cache size for CacheEdge value. > >ATLAS only does 2 levels of explicit blocking. AFAIK,the K10h is >kind of >funky: I believe the caches are exclusive, so the L3 is kind of like >a huge victim cache. In this case, ATLAS will almost assuredly block >for >the L2 with CE (since the L3 is slower). L2 is also for victims from L1 only. But looking on obtained CE value - about 2 MB ! - I thought about L3. > If you >use >3.9.1, ATLAS has a kernel that targets the K10h (with prefetch) more >effectively. I am presently working on 3.9.2, which should be out >next >week at the latest. In the meantime, if you want to use 3.9.1, be >sure >to apply the bug fixes documented at: > http://sourceforge.net/tracker/index.php?func=detail&aid=2024948&group_id=23725&atid=379482 I tuned CE w/4 similtaneous installations w/3.9.1. There was no like hangup , although 1 of 4 installations finished w/some error. Yours Mikhail > >Cheers, >Clint > >************************************************************************** >** R. Clint Whaley, PhD ** Assist Prof, UTSA ** >www.cs.utsa.edu/~whaley ** >************************************************************************** > >------------------------------------------------------------------------- >This SF.Net email is sponsored by the Moblin Your Move Developer's >challenge >Build the coolest Linux based applications with Moblin SDK & win >great prizes >Grand prize is a trip for two to an Open Source event anywhere in the >world >http://moblin-contest.org/redirect.php?banner_id=100&url=/ >_______________________________________________ >Math-atlas-devel mailing list >Math-atlas-devel@... >https://lists.sourceforge.net/lists/listinfo/math-atlas-devel ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
Re: dgemm performance dependance from CacheEdge value>gemm xdfindCE -f res/atlas_cacheedg
>e.h >TA TB M N K alpha beta CacheEdge TIME > MFLOPS >== == ====== ====== ====== ====== ====== ========= ========= > ======== > > T N 1600 1600 1600 1.00 1.00 0 1.228 > 6671.69 > T N 1600 1600 1600 1.00 1.00 16 -2.000 > 0.00 > T N 1600 1600 1600 1.00 1.00 32 -2.000 > 0.00 > T N 1600 1600 1600 1.00 1.00 64 1.514 > 5409.95 > T N 1600 1600 1600 1.00 1.00 128 1.279 > 6405.71 > T N 1600 1600 1600 1.00 1.00 256 1.248 > 6565.56 > T N 1600 1600 1600 1.00 1.00 512 1.239 > 6610.67 > T N 1600 1600 1600 1.00 1.00 1024 1.227 > 6673.78 > T N 1600 1600 1600 1.00 1.00 2048 1.227 > 6673.98 > T N 1600 1600 1600 1.00 1.00 4096 1.227 > 6674.22 > T N 1600 1600 1600 1.00 1.00 8192 1.228 > 6672.71 > >Initial CE=4096KB, mflop=6674.22 > > T N 1600 1600 1600 1.00 1.00 3072 1.227 > 6674.70 > T N 1600 1600 1600 1.00 1.00 2560 1.227 > 6676.11 > T N 1600 1600 1600 1.00 1.00 2304 1.227 > 6674.23 > T N 1600 1600 1600 1.00 1.00 2816 1.227 > 6675.75 > >Best CE=2560KB, mflop=6676.11 >==================================================================== If you look at this output, you see that the performance of CE=2M is absolutely indistinguishable from CE=0 (no L2 blocking). In such a case, ATLAS uses CE, since its partitioning of K reduces workspace needs of large problems. So, what you are seeing is that this system doesn't get any benefit from L2 cache blocking, but that we can afford multiple write of C for large matrices . . . Cheers, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
| Free embeddable forum powered by Nabble | Forum Help |