|
View:
New views
18 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (RVM-341) Improved copying in VM_MemoryImproved copying in VM_Memory
----------------------------- Key: RVM-341 URL: http://jira.codehaus.org/browse/RVM-341 Project: RVM Issue Type: Improvement Components: Runtime Reporter: Ian Rogers Fix For: 2.9.3 r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Updated: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Rogers updated RVM-341: --------------------------- Component/s: Instruction Architecture: Intel There are particular more things we can do for Intel. Looking at: http://cdrom.amd.com/devconn/events/AMD_block_prefetch_paper.pdf we are using a 32bit copy loop with a performance of around 640MB/s (at 2001 bus speeds - DDR2100) whereas the best copy loop achieves 1976MB/s, and this is without using 128bit XMM registers. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 2.9.3 > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_113385 ] Ian Rogers commented on RVM-341: -------------------------------- It appears the best pair of copy instructions are movq to load and movntq to store. The movntq is an unordered store so at the end of the copy loop a sfence is necessary. Similarly (p)xor-ing a register and then using movntq is the best way to zero memory. We can assume SSE. It would be nice to know whether there's an advantage in interleaving SSE XMM register movq/ntq(s) with MMX MM register ones. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 2.9.3 > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_113416 ] Ian Rogers commented on RVM-341: -------------------------------- There's specific coverage of using non-temporal stores and prefetching in section 9.7 of the Intel optimization manual: http://www.intel.com/design/processor/manuals/248966.pdf > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 2.9.3 > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Updated: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Rogers updated RVM-341: --------------------------- Fix Version/s: (was: 2.9.4) 1000 > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178845#action_178845 ] Filip Pizlo commented on RVM-341: --------------------------------- Indeed, it is the case that we're not so great at memcopying. We're better than most non-dynamic Java implementations, but we get soundly destroyed by HotSpot-based systems. I wrote a simple benchmark (to be attached shortly) that does arraycopies between non-overlapping arrays, with the length of the region being copied ranging between 0 and 999 elements. The arrays are char[]. My intuition is that small-ish char arrays have the biggest impact on performance of real benchmarks. Here are the results. The tested VMs: HotSpot 1.5.0_18-b02 server, IcedTea6 1.4 1.6.0_0-b14 (64-bit) server, gcj 4.3.2, fVM 0.0.1 (http://www.fiji-systems.com/), and RVM r15698. HS: 9.8 sec IT 64-bit: 8.7 sec gcj: 24 sec fVM: 18.2 sec RVM: 16.5 sec We beat the ahead-of-time VMs (gcj and fVM) but we get destroyed by the HotSpot-based server VMs. Interestingly, a C program (also to be attached), which attempts to do the same exact thing, while "emulating" the safety checks that Java arraycopy would have to do in the absence of heroic compiler magic, runs in 10.5 sec in 32-bit mode and 9 sec in 64-bit mode. Note that inspecting an assembly dump of the code shows that it just calls memcpy(), which, interestingly, doesn't have any of RVM's optimizations for static awareness of array alignment (8-bit, 16-bit, 32-bit, 64-bit). It has to do the equivalent of our arraycopy8Bit. I included fVM because its implementation of arraycopy() is just a call to memcpy() on the fast path with the minimal safety checks (non-overlapping arrays, negative length, array bounds, etc). Like RVM, the type checks are statically taken care of. I think that fVM misses the same optimization opportunities as RVM (statically observing that the arrays are non-overlapping, trying to use super-special architecture and memory model knowledge to do something better than memcpy, etc). I don't know if what I can learn from fVM can be applied to RVM, but I'll include that in my investigation. Bottom line: thought arraycopy() is not the only thing that matters for performance of real benchmarks, it certainly does matter by a non-trivial amount, and if we're 70% slower for this crucial method, it may actually have a non-trivial impact on benchmark performance. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Updated: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filip Pizlo updated RVM-341: ---------------------------- Attachment: memcpyTestC.c memcpyTest.java > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178846#action_178846 ] Ian Rogers commented on RVM-341: -------------------------------- On new architectures the advice from Intel is clear, use "rep movsq". There is microcoded goodness to avoid unnecessary partial cache line fills. Intel still expect the stores to be aligned. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178975#action_178975 ] Steve Blackburn commented on RVM-341: ------------------------------------- I have kicked off some very simple experiments by way of a limit study, comparing five systems: . hotspot 1.6 with aggressive runtime flags . our svn head . forcing RVMArray to _always_ use out-of-line naive java copying (no memcopy) . forcing RVMArray to _always_ use inline naive java copying (no memcopy) . forcing RVMArray to _always_ use memmove I did this on both core 2 quad and i7. The data is slowly dribbling in; will eventually include dacapo, jvm98 and pjbb2005. At time of writing the following observations come to mind: 1. all of our fancy footwork appears to buy us nothing over simply calling memmove (but wait for final results before drawing conclusions) 2. some of the benchmarks that appear to be using array copy most (the ones that show naive as particularly punishing, such as antlr, bloat & jython at time of writing), are also ones where hotspot is punishing us heavily. 3. for some reason memmove is killing jython. There is no stack trace in the logs, and I've not yet investigated further. The data: http://cs.anu.edu.au/~Steve.Blackburn/private/results/jikesrvm-performance-2009/arraycopy-i7/bmtime.jikes.html http://cs.anu.edu.au/~Steve.Blackburn/private/results/jikesrvm-performance-2009/arraycopy-c2q/bmtime.jikes.html http://cs.anu.edu.au/~Steve.Blackburn/private/results/jikesrvm-performance-2009/arraycopy-i7/bmtime.all.html http://cs.anu.edu.au/~Steve.Blackburn/private/results/jikesrvm-performance-2009/arraycopy-c2q/bmtime.all.html > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178981#action_178981 ] David Grove commented on RVM-341: --------------------------------- One benchmark to watch is jess. array copies of very small Object[] is a key hotspot for this program. It's possible that what we should be doing is a hybrid strategy of going inline for things that are short without bothering with fancy alignment code and going out-of-line (memcopy) to things that are big. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178982#action_178982 ] Ian Rogers commented on RVM-341: -------------------------------- So we never really went very far with the games, ideally we'd have created a 128bit unboxed type and used that for the copies and batch up loads and stores. The furthest the games went were to recognize 64bit load and store pairs and optimize to use an XMM register. This may suffer a penalty on nehalem. Anyway, Intel have done all the work in nehalem and rep movs is the best instruction and even now tackles alignment problems (optimization manual 2.2.6). It would be trivial to create a magic to emit this instruction. Of course all of this is silly as the game changes for 64bits. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178983#action_178983 ] David Grove commented on RVM-341: --------------------------------- The original motivation for all of the trickiness in array copy was to handle code like jess well (copying very small arrays). If we're doing a copy of a very small array, then the overhead of getting out to memcopy is significant. If the array is big, then going to memcopy is the right thing to do because it will be optimized for the target platform. It's likely that the mistake we've made is to try to optimize the inline case too much, especially in the sub-word case, thus making it slower. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=179058#action_179058 ] Steve Blackburn commented on RVM-341: ------------------------------------- It looks for now as though memmove is a better bet than what we have, even on benchmarks that do small copies. memmove gives no significant slowdown on any benchmark (worst case 0.8% slowdown on jack where the 95% CI is 2.4%), and significantly speeds up javac. In particular, jess is not slowed down in any appreciable way. This is a bit disturbing given the relative complexity of what we have against the naivety of simply calling out to memmove. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=179059#action_179059 ] Filip Pizlo commented on RVM-341: --------------------------------- That is disturbing. Note that I got a speedup in fVM (on Fedora 10, x86_64) from just having a check to select either memcpy or memmove depending on whether trg==src. But, in fVM there is zero overhead to making the call to memcpy/memmove since I'm emitting C code. In RVM sysCalls may not be so cheap, so minute differences in performance between memcpy and memmove may not make any difference. Or are you making a call to memmove using some even-more-lowlevel approach? On ia32 it should be possible to call it directly in some cases... In general, if we're running the compiler while hosted it seems that sysCalls don't have to do the address lookup from BootRecord. I don't know if we do it already or not, or if it would matter at all. Have you committed? If not, can you send me a patch? I'm playing around with this as well and doing my own perf comparisons. It would be interesting if we could compare results. -F > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=179063#action_179063 ] Steve Blackburn commented on RVM-341: ------------------------------------- All I did was a filthy hack to test the above. No intention of committing; I was just hearing Tony & Daniel discuss it and thought I'd throw some numbers into the mix :-) For the memmove numbers, all I did was: a) change sysCopy to use memmove (for the real deal I'd add a sysMove call, but I was lazy), b) change all conditional calls to sysCopy to be unconditional. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Updated: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Blackburn updated RVM-341: -------------------------------- Attachment: arraycopy-options.patch Here's the very simple patch for the stuff I did. Obviously I undid some of the changes to generate the 3 variations on the head I described above. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: arraycopy-options.patch, memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=179065#action_179065 ] Steve Blackburn commented on RVM-341: ------------------------------------- Ugh. On reviewing the patch I see that I did not do it right :-/ Sorry. I forgot to change Memory.java. Will do so now and have results in a few hours. Apols for the misleading info. > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: arraycopy-options.patch, memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
|
|
[jira] Commented: (RVM-341) Improved copying in VM_Memory[ http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=179088#action_179088 ] Steve Blackburn commented on RVM-341: ------------------------------------- The new numbers are appearing here: http://cs.anu.edu.au/~Steve.Blackburn/private/results/jikesrvm-performance-2009/arraycopy-i7/bmtime.jikes.html Dave was completely right. jess shows naive use of memmove to be a bad choice. The "memmove 512" column is the same broken setup as shown earlier. What happens there is we a) treat all copies as non-overlapping but only call to memmove when the copy is less than 512. While memmove is safe for overlapping arrays, the other java code is not (it is not intended to be). So it is surprising that only jython crashes. Sorry for wasting your time with these bogus numbers! Fortunately we can still derive something interesting from the data: a) primitive array copy performance can significantly affect the bottom line in real benchmarks. b) by looking at the "naive" numbers, we can now see which benchmarks are sensitive to array copy performance. c) javac appears to do a lot of overlapping copies, so we win considerably by bypassing that logic (though of course to do so is incorrect) > Improved copying in VM_Memory > ----------------------------- > > Key: RVM-341 > URL: http://jira.codehaus.org/browse/RVM-341 > Project: RVM > Issue Type: Improvement > Components: Instruction Architecture: Intel, Runtime > Reporter: Ian Rogers > Fix For: 1000 > > Attachments: arraycopy-options.patch, memcpyTest.java, memcpyTestC.c > > > r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups: > http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report > most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ Jikesrvm-issues mailing list Jikesrvm-issues@... https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues |
| Free embeddable forum powered by Nabble | Forum Help |