[
http://jira.codehaus.org/browse/RVM-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=178845#action_178845 ]
Filip Pizlo commented on RVM-341:
---------------------------------
Indeed, it is the case that we're not so great at memcopying. We're better than most non-dynamic Java implementations, but we get soundly destroyed by HotSpot-based systems.
I wrote a simple benchmark (to be attached shortly) that does arraycopies between non-overlapping arrays, with the length of the region being copied ranging between 0 and 999 elements. The arrays are char[]. My intuition is that small-ish char arrays have the biggest impact on performance of real benchmarks.
Here are the results. The tested VMs: HotSpot 1.5.0_18-b02 server, IcedTea6 1.4 1.6.0_0-b14 (64-bit) server, gcj 4.3.2, fVM 0.0.1 (
http://www.fiji-systems.com/), and RVM r15698.
HS: 9.8 sec
IT 64-bit: 8.7 sec
gcj: 24 sec
fVM: 18.2 sec
RVM: 16.5 sec
We beat the ahead-of-time VMs (gcj and fVM) but we get destroyed by the HotSpot-based server VMs.
Interestingly, a C program (also to be attached), which attempts to do the same exact thing, while "emulating" the safety checks that Java arraycopy would have to do in the absence of heroic compiler magic, runs in 10.5 sec in 32-bit mode and 9 sec in 64-bit mode. Note that inspecting an assembly dump of the code shows that it just calls memcpy(), which, interestingly, doesn't have any of RVM's optimizations for static awareness of array alignment (8-bit, 16-bit, 32-bit, 64-bit). It has to do the equivalent of our arraycopy8Bit.
I included fVM because its implementation of arraycopy() is just a call to memcpy() on the fast path with the minimal safety checks (non-overlapping arrays, negative length, array bounds, etc). Like RVM, the type checks are statically taken care of. I think that fVM misses the same optimization opportunities as RVM (statically observing that the arrays are non-overlapping, trying to use super-special architecture and memory model knowledge to do something better than memcpy, etc). I don't know if what I can learn from fVM can be applied to RVM, but I'll include that in my investigation.
Bottom line: thought arraycopy() is not the only thing that matters for performance of real benchmarks, it certainly does matter by a non-trivial amount, and if we're 70% slower for this crucial method, it may actually have a non-trivial impact on benchmark performance.
> Improved copying in VM_Memory
> -----------------------------
>
> Key: RVM-341
> URL:
http://jira.codehaus.org/browse/RVM-341> Project: RVM
> Issue Type: Improvement
> Components: Instruction Architecture: Intel, Runtime
> Reporter: Ian Rogers
> Fix For: 1000
>
>
> r13857 improved memory copying for Intel with SSE2 so that we used 64bit copies rather than 32bit copies. This gave a large number of speed ups:
>
http://jikesrvm.anu.edu.au/cattrack/results/rvmx86lnx32.anu.edu.au/perf/1790/performance_report> most notably on SpecJBB 2000. There is a low-hanging fruit to improve this further, for example, by using 128bit copies and using more than 1 register to do the copying.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, &
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship.
http://p.sf.net/sfu/creativitycat-com
_______________________________________________
Jikesrvm-issues mailing list
Jikesrvm-issues@...
https://lists.sourceforge.net/lists/listinfo/jikesrvm-issues