|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
3.9.5: much improved threadingGuys,
I have finally gotten 3.9.5 out the door. It's been several months, but I've actually been busy the whole time. The big news is that I finally finished a complete rewrite of ATLAS's threading system. You will see only a small difference for 2-processor machines, but on 4 and 8 processor machines, the new threaded code can more than double your performance (assuming you aren't on a loser OS like MacOS X or FreeBSD, that don't posses processor affinity). I posted some factorization timings at: http://math-atlas.sourceforge.net/timing/3_9_5/index.html This code is all pretty new, and since I rewrote everything, we can probably expect a segfault-fest, but it at least passes the sanity tests on Windows and Linux. It is still fairly rough, and as the timing page mentions, I need to do some tuning for small-case factorizations. Let me know if you use it, Clint ATLAS 3.9.5 released 12/11/08, Changes from 3.9.4: * Complete rewrite of ATLAS threading system: - Now supports native windows threads in addition to pthreads - Use of master-last and affinity increases threaded performance, with an advantage that grows with P (almost no advantage for P=2, but for instance LU is more than 60% faster asymptotically on a P=8 Core2) + OS X and FreeBSD don't support processor affinity, and so their performance is still bad * Changed emit_buildinfo so that it replaces all control characters with spaces (prevents errors under windows). * Added dependency info for ATL_ilaenv so that it is recompiled once lapack tuning is complete * Fixed error in configure where it issues commands in wrong directory when the user builds lapack directly from a tarfile * Fixed typos in config.c where I used 'comp' rather than 'comps'. * Added mmtime_pt.c, which can allow us to find kernels that do well in parallel operation. * Various small configure fixes for windows ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
Re: 3.9.5: much improved threadingOn Dec 11, 2008, at 10:21 PM, Clint Whaley wrote: > You will see only a small difference for 2-processor machines, but > on 4 and 8 processor machines, the new threaded code can more than > double your performance (assuming you aren't on a loser OS like > MacOS X or FreeBSD, that don't posses processor affinity). http://developer.apple.com/releasenotes/Performance/RN-AffinityAPI/ ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
Re: 3.9.5: much improved threadingIan,
>> You will see only a small difference for 2-processor machines, but >> on 4 and 8 processor machines, the new threaded code can more than >> double your performance (assuming you aren't on a loser OS like >> MacOS X or FreeBSD, that don't posses processor affinity). > >http://developer.apple.com/releasenotes/Performance/RN-AffinityAPI/ Thanks for the link! I asked my contacts at apple about this, and was told you guys do not support affinity, so it is a relief to see you guys doing something here. Do you have a link to more documentation? I'm finding this page a little sparse on usage details. I did some digging, and found that I can translate a pthread_t into a thread_t using the pthread_mach_thread_np interface. However, since pthreads start executing when the pthread_create function is called, I guess I have to start them up, and then change their affinity after they have started running (the page suggests you start the thread, and then change their affinity before they start running, and I don't see how this can be done in pthreads). This will blunt a lot of the advantage to affinity. The docs I found suggest that you should not use thread_t threads directly, so how are you supposed to start the thread up w/o starting it running? I don't suppose you have/are considering supporting something like linux has, where you can modify the thread attribute using pthread_attr_setaffinity_np? Can you supply some example code of using affinity with pthreads? The page has no example calls, and no mention of what values the thread affinity tag can take, and what that would mean if they had a given value. I notice that does not provide processor affinity, but rather something it describes as L2 affinity. I take this to mean that a thread will therefore be allowed to migrate between processors that share a cache. Our timings indicate you will lose a lot of the performance gain if this case. Do you presently have any way, or plans to add support for, true processor affinity? Why did you not add processor affinity when you mucked about with this L2 affinity? Thanks, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
|
|
|
|
|
Re: 3.9.5: much improved threadingIan,
>> You will see only a small difference for 2-processor machines, but >> on 4 and 8 processor machines, the new threaded code can more than >> double your performance (assuming you aren't on a loser OS like >> MacOS X or FreeBSD, that don't posses processor affinity). > >http://developer.apple.com/releasenotes/Performance/RN-AffinityAPI/ I gave this a quick scope, and it appears to be inadequate for what we need, if I translate this page correctly. It appears you guys are following the horrible convention of calling one package a processor, and that this page is then describing that you can use your affinity to ensure one thread/package, but you cannot ensure one thread/core. Is this the case? If so, you can definitely not do master last, and even persistant worker will be messed up due to the scheduler moving things around within a package. Master and processor affinity (processor == core, processor != package) can make a huge difference, as you can see: http://math-atlas.sourceforge.net/timing/newThr395/index.html You can read about the techniques themselves in our IPDPS paper: http://www.cs.utsa.edu/~whaley/papers/ettIEEE.pdf Is there any chance apple is going to provide core-level affinity sometime soon? Thanks, Clint ************************************************************************** ** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley ** ************************************************************************** ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Math-atlas-devel mailing list Math-atlas-devel@... https://lists.sourceforge.net/lists/listinfo/math-atlas-devel |
| Free embeddable forum powered by Nabble | Forum Help |